Scientific research is built on facts, but the keystone holding the arch of our scientific institution together is starting to crack.
We expect scientific institutions and their scientists to be infallible, their experiments proven and taken as fact so others can accurately replicate — and advance — their work in humanity’s collective plight to find answers. Unfortunately, that is not always the case; a growing catalog of reports shows several landmark studies and significant scientific discoveries based on false, plagiarized or manipulated data.
Last year, Science Magazine found that the amyloid hypothesis, used for decades as the basis for studying and treating Alzheimer’s disease, was based partly on false data and images. In 2012, biotech pioneer Amgen attempted to reproduce results from 53 landmark cancer studies and failed with 47, an 11% success rate. This summer, three prominent researchers from Duke, Stanford and Harvard faced separate accusations of data manipulation of past research, accusations so damaging that Marc Tessier-Lavigne, the Stanford researcher and university president, was forced to resign.
Trust in research is waning within the scientific community and among the public, bringing far-ranging implications for our ability to scale economies, boost the quality of living and mitigate disease (not to mention sinking taxpayer funds into questionable research).
There are many reasons for this crisis, including the lack of incentives and good infrastructure for scientists to share underlying data and code in their publications. Most data and code are lost or inaccessible on centralized servers or clouds, making it impossible to check the reproducibility of most empirical results.
The fidelity and accessibility of scientific research won’t improve if we continue like this. Scientific research is too critical for humanity; it should live in an open dataverse secured by an verifiable index accessible to humans and machines.
The problems of centralization
Centralization has contributed to the replication crisis and the erosion of trust in scientific institutions and research. Its disadvantages include missing scalability and flexibility, data sovereignty and a single point of failure.
Centralization also fragments data into silos with low cross-team visibility, making information difficult to access, reproduce and verify. As I wrote above, third-party researchers and internal academic investigators face significant roadblocks to accessing original data, making it nearly impossible to reproduce results or detect problems.
Read more from our opinion section: DeFi has not followed through on its privacy promises — yet
The emergence of Web3 and blockchain technology provides a compelling technical solution to the problem of siloed data systems. Content-addressed storage tools like IPFS (InterPlanetary File System) and Filecoin enable scientists to redesign data storage and accessibility in ways Web2 can’t, ensuring data integrity within the FAIR principles (findable, accessible, interoperable, reusable). In Web2, URLs point to where a file is stored, which leads to problems such as link rot or content drift if the file is moved or content is changed, both of which happen frequently. In Web3, however, content addressing generates a unique hash for each file, meaning even the tiniest change in content leads to an entirely different hash. Using these unique content hashes as identifiers solves both the problem of link rot and content drift. It also allows multiple entities to store the same file at different places, enabling institutional autonomy and improving content availability. This breaks up data silos in favor of distributed, open systems that guarantee content availability without paywalls and enable institutional sovereignty.
Open access to scientific manuscripts alone is one step towards solving the replication crisis.
We must also move beyond the PDF as the dominating form to publish science and embrace a new model based on versionable, FAIR digital research objects that contain all relevant research project components — manuscript, data, code, videos and more — to enable the reproducibility and reuse of invaluable information. In decentralized scholarly publication systems, qualified third parties – including publishers, funding agencies, academic societies and field experts – can use cryptographically signed attestations to evaluate and verify desirable characteristics of research. For example, badges for data availability or computational reproducibility would be clearly visible on the research objects, allowing readers to filter their search for such content, thus creating valuable metadata that can be used to improve the incentives for scientists. Based on IPFS, a protocol for decentralized persistent identifiers for science (DPIDs) is under development.
Answering the arguments against open science
Open science practices are the most promising way forward for the scientific community, but face some headwinds. There are some common arguments against its implementation, including:
- Data Privacy: Scientific data often contains sensitive information that should not and cannot legally be shared openly, including genetic data, health records and financial history. As we become more digitally dependent, the risk of cyberattacks that threaten information safety increases significantly.
- Lack of Incentives: Researchers lack incentives to share their data and code openly because doing so could eliminate competitive advantages over other scientists. Without tangible returns, it creates more work for researchers, and their transparency allows colleagues to highlight mistakes.
Objections from open science detractors partially stem from resistance to change, which has plagued many industries amid digital transformation and the emergence of blockchain technologies. It’s essential to understand how open science addresses these concerns:
- Data Privacy: FAIR doesn’t dictate blanket data accessibility, but there should be a path for those seeking access with a legitimate reason. IPFS nodes can run on private servers with content addressers, provenance identifiers and identification checks. Open science can employ blockchain while on a server with privacy restrictions.
- Lack of Incentives: Funding agencies increasingly encourage data and code sharing. In 2022, the White House Office of Science and Technology Policy (OSTP) mandated that federally funded agencies publicly share research, data and code for free. Encouragement from policymakers to create incentives for open science can be enabled and supported by a new, decentralized scientific infrastructure.
Enabling progress
Science is about progress, new developments and, crucially, facts. But for too long, the scientific community has stagnated because of outdated and inefficient methods of storing, preserving and accessing research. The result is an industry beset by wasted time and questionable incentives, fraud and manipulated data directly impacting real people.
Scientific research holds too much importance to remain siloed and inaccessible in centralized systems. The emergence of open science and decentralized blockchain tools can and will solve this issue, enabling scientists to use new methods of storing and accessing research that the current Web2 system cannot match. Without reliable, accessible and trustworthy scientific research, we are screwed as a species.
Prof. Dr. Philipp Koellinger co-founded DeSci Labs, developing next-generation technologies to promote replicable, open, and FAIR scientific publishing. He is also the president of the DeSci Foundation, which supports the development of a more verifiable, open, and fairer ecosystem for science and scientists. In addition, Philipp is a full professor of economics at Vrije Universiteit Amsterdam and a principal investigator and co-founder of several scientific research consortia, including the Social Science Genetic Association Consortium (SSGAC), the BIG BEAR Consortium, and the Externalizing Consortium. His research has been published in journals such as Nature, Science, Nature Genetics, Nature Neuroscience, Nature Human Behaviour, and the Review of Economics and Statistics.