Datasets for genotype information
1
0
Entering edit mode
4.6 years ago
mono • 0

Hi all,

Is there any open source dataset that provides genotype information for a list of individuals? I am specifically looking for something in this format:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA00001 NA00002 NA00003
20  14370   rs6054257   G   A   29  PASS    DP=14;AF=0.5;DB GT:DP   0/0:1   0/1:8   1/1:5
20  17330   .   T   A   3   q10 DP=11;AF=0.017  GT:DP   0/0:3   0/1:5   0/0:41
20  1110696 rs6040355   A   G,T 67  PASS    DP=10;AF=0.333,0.667;DB GT:DP   0/2:6   1/2:0   2/2:4
20  1230237 .   T   .   47  PASS    DP=13   GT:DP   0/0:7   0/0:4   ./.:.
20  1234567 microsat1   GTC G,GTCT  50  PASS    DP=9    GT:DP   0/1:4   0/2:2   1/1:3

Where NA00001/2/3 are individual samples.

I am new to this field, so I could not use proper jargon for this kind of dataset, please let me know how to address it.

SNP gene • 812 views
ADD COMMENT
1
Entering edit mode
4.6 years ago
Ram 43k

This particular format is called a Variant Call Format (VCF). You can access open datasets such as 1000genomes, ExAC, gnomAD, etc - they all release data in VCF format.

ADD COMMENT
0
Entering edit mode

I understand it is VCF format, I downloaded a lot of files from 1000 genomes , Stanford open source etc. But I wasn't getting what I'm looking for. I need individual samples that indicate their genotypes like 0/0 or 0/1 etc. Most of the datasets have only the following information:

#CHROM  POS ID  REF ALT QUAL    FILTER  
20  14370   rs6054257   G   A   29  PASS

Where shud I look for the individual samples which contain information about homozygous recessive or dominant or heterozygous genotypes

ADD REPLY
1
Entering edit mode

You might have to do some digging to find VCF files with individual genotype information. The most easily found files will expose locus and INFO annotations, but they should also offer individual level information unless that data is restricted/controlled access.

In fact, the default 1000g VCF seems to have individual level genotype info.

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

zgrep -A2 "#CHROM" ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz | cut -f 1-15
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  HG00096 HG00097 HG00099 HG00100 HG00101 HG00102
22  16050075    rs587697622 A   G   100 PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=8012;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.001;AA=.|||;VT=SNP GT  0|0 0|0 0|0 0|0 0|0 0|0
ADD REPLY
0
Entering edit mode

Thank you very much, i was accessing the wrong files.

ADD REPLY

Login before adding your answer.

Traffic: 2849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6