Where to find relevant Healthcare Clinical Trial data
Entering edit mode
17 months ago
d • 0

I would like to know, based on your personal experience, the best dataset repositories to get cilincal trial data and general patient data (such as cardiovascular disease and cancer by type of disease, age, sex etc...). I have already tried Kaggle and such.

Thank you very much!

data clinical database healthcare • 625 views
Entering edit mode
17 months ago

You should be more specific what kind of data you are looking for and for what purpose, because in its current form your question is incredibly broad.

General patient data usually resides with healthcare providers like medical center networks and insurance companies and is not publicly available. Due to strict data privacy regulations and heterogenous data sources, obtaining detailed data publicly is difficult. Roche would not have reached so deep into its pocket to acquire Flatiron Health for almost 2 Billion US Dollars, if it were not for the trove of aggregated patient data that Flatiron gathered while providing IT support for hundreds of small, previously isolated cancer centres at discounted rates.

In contrast, getting metadata about clinical trials is easy, since they need to be registered in advance. So you can retrieve information about the drugs being tested, the sponsors, the design, the indications etc. from the respective Clinical Trial Registries. The WHO operates one, the FDA, the EMA and other competent authorities for the approval of medicinal products.

However, all of those studies are conducted according to an international standard called Good Clinical Practise. GCP warrants that there is an organizational separation between the sponsor (the pharmaceutical company investigating the drug) and the physicians conducting it. Contract-research organizations perform the actual clinical trials on behalf of the sponsors and record the outcomes. Although the sponsors sent Clinical Monitors to the CROs, who may audit individual patient records for review, the sponsor generally has no access to the data of the individual study participant. GCP demands that, under no circumstances, a study participant shall be identifiable to the sponsor or the regulatory authority. Hence, the CRO withholds any information that might make backtracing of participants possible. For each personal data, a check is made to determine how great the risk is that this will make a person identifiable. So it might be that the CRO reports the precise age and sex of some participants (M43), but recodes others (F105) to age bands like ">75+", because there are so little women at the age of 105, that this alone would pose a severe risk of deanonymization. You can get an idea how much effort and consideration goes into this process by reading the respective Anonymization Reports that are part of any trial registration process (see for example one of an efficacy trial for the Pfizer SARS-Cov-2 vaccine).

Therefore, it is impossible to obtain individual participant data for clinical trials on a large scale. The study results, as they are submitted to the competent authority as part of the marketing authorization application, are generally not public (with the notable exception of many Covid-19 trials, where comprehensive data was made public to combat misinformation and vaccine hesitancy). For those that are not public, researchers and journalists can apply for access to the same information that the experts of the competent authority review during the marketing authorization process. Such data can also be accessed via the competent authorities, e.g. EMA or Health Canada.

To sum up: Most of this data is not public and will therefore never be posted to Kaggle and such. Aggregated and anonymized data may be available to researchers upon request and within a specific scope, but never individual patient records.


Login before adding your answer.

Traffic: 1836 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6