Frequency of Exome data from 1000 Genomes Project
2
2
Entering edit mode
9.7 years ago
ankita ▴ 20

How to retrieve frequency data for all exonic variants ( exomes analysed in Phase 1) of 1000 Genomes project?

Exomes - 1KG Project • 4.0k views
ADD COMMENT
0
Entering edit mode

Thanks Sean, I have seen one of the links but it will compute frequency for the particular region. I am not able to locate any VCF file where I will get frequency of all exonic variations for studied populations (computed from exome data). If you have any idea about such data, please help.

ADD REPLY
3
Entering edit mode
9.7 years ago
rbagnall ★ 1.8k

If you download the:

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz

vcf file from the 1000 Genomes ftp site, there is a SNPSOURCE attribute in the vcf file information fields (SNPSOURCE=EXOME, or SNPSOURCE=LOWCOV, or SNPSOURCE=LOWCOV,EXOME etc).

So to get the variants called by exome sequencing you can do:

zless ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz | grep "EXOME"

and this will give the exome variants, with their AC (allele count); AN (allele number etc)

1    69536    rs200013390    C    T    100    PASS    AA=.;AC=0;AF=0;AN=2184;AVGPOST=0.9986;ERATE=0.0006;LDAF=0.0008;RSQ=0.0677;SNPSOURCE=EXOME;THETA=0.0087;VT=SNP
1    861275    rs199884417    C    T    100    PASS    AA=C;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0005;VT=SNP
1    861292    rs191719684    C    G    100    PASS    AVGPOST=1.0000;AA=C;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;THETA=0.0012;LDAF=0.0014;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01
1    861315    rs200140498    G    A    100    PASS    AA=G;AC=2;AF=0.0009;AN=2184;ASN_AF=0.0035;AVGPOST=0.9997;ERATE=0.0003;LDAF=0.0011;RSQ=0.8902;SNPSOURCE=EXOME;THETA=0.0008;VT=SNP
1    865488    rs202189913    A    G    100    PASS    AA=N;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=0.9987;ERATE=0.0005;LDAF=0.0011;RSQ=0.4947;SNPSOURCE=EXOME;THETA=0.0011;VT=SNP
1    865545    rs201186828    G    A    100    PASS    AA=g;AC=4;AF=0.0018;AN=2184;ASN_AF=0.01;AVGPOST=0.9979;ERATE=0.0005;LDAF=0.0025;RSQ=0.6639;SNPSOURCE=EXOME;THETA=0.0009;VT=SNP
1    865584    rs148711625    G    A    100    PASS    RSQ=0.9432;AVGPOST=0.9983;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;AC=26;VT=SNP;LDAF=0.0122;THETA=0.0007;ERATE=0.0003;AF=0.01;AMR_AF=0.0028;AFR_AF=0.05
1    865628    rs41285790    G    A    100    PASS    AC=7;LDAF=0.0033;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0006;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.01;EUR_AF=0.01
1    865662    rs140751899    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9998;ERATE=0.0003;LDAF=0.0005;RSQ=0.8540;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP
1    865664    rs199655347    C    T    100    PASS    AA=c;AC=0;AF=0;AN=2184;AVGPOST=0.9996;ERATE=0.0003;LDAF=0.0002;RSQ=0.0997;SNPSOURCE=EXOME;THETA=0.0028;VT=SNP
1    865694    rs9988179    C    T    100    PASS    AC=136;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9987;LDAF=0.0621;VT=SNP;AA=c;THETA=0.0006;AVGPOST=0.9998;ERATE=0.0003;AF=0.06;ASN_AF=0.16;AMR_AF=0.08;AFR_AF=0.03;EUR_AF=0.0026
1    865700    rs116730894    C    T    100    PASS    AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;AA=c;LDAF=0.0014;THETA=0.0010;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01
1    865705    rs146331776    C    T    100    PASS    RSQ=0.9762;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0018;THETA=0.0005;VT=SNP;AA=c;AC=4;ERATE=0.0003;AVGPOST=0.9999;AF=0.0018;AFR_AF=0.01
1    865734    rs201326364    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP
1    865738    rs139570490    A    G    100    PASS    AC=7;LDAF=0.0033;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0010;AA=a;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.0028;EUR_AF=0.01
1    866371    rs200617908    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0005;RSQ=0.9135;SNPSOURCE=EXOME;THETA=0.0013;VT=SNP
1    866422    rs139210662    C    T    100    PASS    AC=7;AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0032;VT=SNP;AA=c;RSQ=1.0000;THETA=0.0007;ERATE=0.0003;AF=0.0032;AMR_AF=0.01;AFR_AF=0.01
1    866488    rs200139083    G    A    100    PASS    AA=g;AC=0;AF=0;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0000;RSQ=0.0499;SNPSOURCE=EXOME;THETA=0.0004;VT=SNP
ADD COMMENT
0
Entering edit mode

Thanks a lot rbagnall, this is exactly what I want...

ADD REPLY
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2361 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6