OpenTrials is an open, online database of information about the world’s clinical research trials. We worked with Dr Ben Goldacre and his team of experts from the Centre for Evidence-Based Medicine at the University of Oxford to create a public beta version and bring together details of more than 350,000 trials from openly licensed trial registries or academic repositories.

Opportunity

As outlined in a 2016 paper for BioMed Central on why OpenTrials was needed, existing information management systems for clinical trials are disjointed, counter-productive and flawed. Thousands of trials aren’t published, overall data quality is poor and establishing linkages between documents is difficult.

Our project sought to transform this situation by building an open, free alternative which would act as a single central point of reference for information about the world’s clinical trials - including reports, data, academic papers, previously unpublished information and other digital objects associated with each trial.

The goal was that this would increase discoverability of trials information, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine.

We encountered several challenges on this project around the lack of standardised metadata for trials, restrictive data licenses and data scraping issues, but also engaged in fascinating and productive conversations with leading trial registers which lead to several changing their terms and conditions to allow far greater use of their data.

How we helped

To help deliver OpenTrials, Open Knowledge International created a data model, data explorer and an API as well as developing a custom framework of data collectors and processors built to ingest and thread together clinical trial records from large source registries. We populated our open database through web-scraping, record-linkage techniques and imports of existing structured and linked data building on our extensive experience with these methods of data collection and cleaning.

Overview of OpenTrials data schema and information flow.

We developed user-friendly contributor functionality to allow for crowdsourced curation by medical experts to take place around selected drug areas as well as creating an automated deduplication process using PYBOSSA with the option of manual deduplication for complex trials or particular focus areas.

Informed by user testing and personas, we prioritised technical development tasks and data explorer functionality which researchers and medical professionals wanted to see on the database.

Thanks to these tools and processes, the beta version of the OpenTrials database contains data and documents connected to more than 358,000 clinical trials from some of the world’s largest clinical trial registries.

Results

Information derived from the OpenTrials database was used by University of Oxford team to test hypotheses for several research papers examining clinical transparency or publication requirements in focus areas including Ebola trials and results published on the European Union’s Clinical Trials Register. This helped our partners to strengthen their case for greater transparency and better data on clinical trials.

Detailed analysis of the licensing arrangements used by clinical trial registers allowed our team to enter into fruitful conversations about the benefits of more open terms of use. Following contact from our project team, GlaxoSmithKline rapidly updated the terms and conditions on their registry website to allow open use of their clinical study reports, trial protocols and scientific summaries. Other key actors like ISRCTN and ClinicalStudyDataRequest.com agreed to introduce more open licensing arrangements.

The OpenTrials paper by Dr Goldacre and Jonathan Gray from Open Knowledge International garnered prominent mentions in publications like The Lancet and the British Medical Journal with OpenTrials being cited by researchers outlining the need for a more data-centric approach to the publication of clinical research.

Reach your open data project's full potential