Where can I find a vcf file with individuals in it, that have a specific disease?
2
0
Entering edit mode
4 months ago
Patrick • 0

I built a tool for PRS - Calculation and interpretation for my thesis, and in order to evaluate it, I need a .vcf file from one (or many) individuals with a certain disease (ideally: Hypertension, Breast cancer, Lung cancer or Type 2 Diabetes).

Does anybody have any ideas where I can find data like that?

Phenotype VCF • 958 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
4 months ago
cmdcolin ★ 3.8k

human data often comes with restrictions on access due to privacy. the uk biobank and nih all of us require applying for access for example

you can access certain types of data freely on certain types of patients however in other circumstances

for example, here are somatic mutations from the TCGA (the cancer genome atlas) in vcf format. this is just the somatic mutations called with various tools. cnv calls, gene expression, and other things are also available with uncontrolled access for bam/cram/germline vcf require controlled access https://portal.gdc.cancer.gov/repository?filters=%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22content%22%3A%7B%22field%22%3A%22cases.case_id%22%2C%22value%22%3A%5B%22set_id%3AkNa6uowBB1xFKettIGZf%22%5D%7D%2C%22op%22%3A%22IN%22%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22files.data_format%22%2C%22value%22%3A%5B%22vcf%22%5D%7D%7D%5D%7D

there is also the 1000 genomes dataset, which is a fully open repository of human sequencing data with no controlled access (which is quite unique)! the clinical data available with the 1000 genomes is limited but it has been analyzed in some papers, for example https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135193 and it has full unrestricted access to bam/cram/vcf (including germline/ancestry info) at https://www.internationalgenome.org/data that said, you may be going out on a limb trying to clinically characterize the 1000 genomes project

ADD COMMENT
0
Entering edit mode

Thanks for your detailed answer.

I tried the link you send from TCGA, but when trying to download something, I am required to to sign in as a researcher which I am not.

However, I found a website called openSNP where people can upload their raw data from companies like 23andme, along with some phenotype descriptions. I think you can convert these into vcf files as well Although it's probably not as detailed and accurate as data from the other, more official sources, I think it may suffice for my purposes.

ADD REPLY
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6