Entering edit mode
4.6 years ago
mono
•
0
Hi all,
Is there any open source dataset that provides genotype information for a list of individuals? I am specifically looking for something in this format:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
20 14370 rs6054257 G A 29 PASS DP=14;AF=0.5;DB GT:DP 0/0:1 0/1:8 1/1:5
20 17330 . T A 3 q10 DP=11;AF=0.017 GT:DP 0/0:3 0/1:5 0/0:41
20 1110696 rs6040355 A G,T 67 PASS DP=10;AF=0.333,0.667;DB GT:DP 0/2:6 1/2:0 2/2:4
20 1230237 . T . 47 PASS DP=13 GT:DP 0/0:7 0/0:4 ./.:.
20 1234567 microsat1 GTC G,GTCT 50 PASS DP=9 GT:DP 0/1:4 0/2:2 1/1:3
Where NA00001
/2
/3
are individual samples.
I am new to this field, so I could not use proper jargon for this kind of dataset, please let me know how to address it.
I understand it is VCF format, I downloaded a lot of files from 1000 genomes , Stanford open source etc. But I wasn't getting what I'm looking for. I need individual samples that indicate their genotypes like 0/0 or 0/1 etc. Most of the datasets have only the following information:
Where shud I look for the individual samples which contain information about homozygous recessive or dominant or heterozygous genotypes
You might have to do some digging to find VCF files with individual genotype information. The most easily found files will expose locus and
INFO
annotations, but they should also offer individual level information unless that data is restricted/controlled access.In fact, the default 1000g VCF seems to have individual level genotype info.
Thank you very much, i was accessing the wrong files.