Question: Frequency of Exome data from 1000 Genomes Project
2
gravatar for ankita
5.3 years ago by
ankita20
India
ankita20 wrote:

How to retrieve frequency data for all exonic variants ( exomes analysed in Phase 1) of 1000 Genomes project?

exomes - 1kg project • 2.7k views
ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by ankita20

Thanks Sean, I have seen one of the links but it will compute frequency for the particular region. I am not able to locate any VCF file  where I will get frequency of all exonic variations for studied populations (computed from exome data). If you have any idea about such data, please help.

ADD REPLYlink written 5.3 years ago by ankita20
2
gravatar for rbagnall
5.3 years ago by
rbagnall1.4k
Australia
rbagnall1.4k wrote:

If you download the:

wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz

 

vcf file from the 1000 Genomes ftp site, there is a SNPSOURCE attribute in the vcf file information fields (SNPSOURCE=EXOME, or SNPSOURCE=LOWCOV, or SNPSOURCE=LOWCOV,EXOME etc).

 

So to get the variants called by exome sequencing you can do:

zless ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz | grep "EXOME"

 and this will give the exome variants, with their AC (allele count); AN (allele number etc)

 

1    69536    rs200013390    C    T    100    PASS    AA=.;AC=0;AF=0;AN=2184;AVGPOST=0.9986;ERATE=0.0006;LDAF=0.0008;RSQ=0.0677;SNPSOURCE=EXOME;THETA=0.0087;VT=SNP

1    861275    rs199884417    C    T    100    PASS    AA=C;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0005;VT=SNP

1    861292    rs191719684    C    G    100    PASS    AVGPOST=1.0000;AA=C;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;THETA=0.0012;LDAF=0.0014;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01

1    861315    rs200140498    G    A    100    PASS    AA=G;AC=2;AF=0.0009;AN=2184;ASN_AF=0.0035;AVGPOST=0.9997;ERATE=0.0003;LDAF=0.0011;RSQ=0.8902;SNPSOURCE=EXOME;THETA=0.0008;VT=SNP

1    865488    rs202189913    A    G    100    PASS    AA=N;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=0.9987;ERATE=0.0005;LDAF=0.0011;RSQ=0.4947;SNPSOURCE=EXOME;THETA=0.0011;VT=SNP

1    865545    rs201186828    G    A    100    PASS    AA=g;AC=4;AF=0.0018;AN=2184;ASN_AF=0.01;AVGPOST=0.9979;ERATE=0.0005;LDAF=0.0025;RSQ=0.6639;SNPSOURCE=EXOME;THETA=0.0009;VT=SNP

1    865584    rs148711625    G    A    100    PASS    RSQ=0.9432;AVGPOST=0.9983;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;AC=26;VT=SNP;LDAF=0.0122;THETA=0.0007;ERATE=0.0003;AF=0.01;AMR_AF=0.0028;AFR_AF=0.05

1    865628    rs41285790    G    A    100    PASS    AC=7;LDAF=0.0033;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0006;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.01;EUR_AF=0.01

1    865662    rs140751899    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9998;ERATE=0.0003;LDAF=0.0005;RSQ=0.8540;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP

1    865664    rs199655347    C    T    100    PASS    AA=c;AC=0;AF=0;AN=2184;AVGPOST=0.9996;ERATE=0.0003;LDAF=0.0002;RSQ=0.0997;SNPSOURCE=EXOME;THETA=0.0028;VT=SNP

1    865694    rs9988179    C    T    100    PASS    AC=136;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9987;LDAF=0.0621;VT=SNP;AA=c;THETA=0.0006;AVGPOST=0.9998;ERATE=0.0003;AF=0.06;ASN_AF=0.16;AMR_AF=0.08;AFR_AF=0.03;EUR_AF=0.0026

1    865700    rs116730894    C    T    100    PASS    AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;AA=c;LDAF=0.0014;THETA=0.0010;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01

1    865705    rs146331776    C    T    100    PASS    RSQ=0.9762;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0018;THETA=0.0005;VT=SNP;AA=c;AC=4;ERATE=0.0003;AVGPOST=0.9999;AF=0.0018;AFR_AF=0.01

1    865734    rs201326364    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP

1    865738    rs139570490    A    G    100    PASS    AC=7;LDAF=0.0033;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0010;AA=a;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.0028;EUR_AF=0.01

1    866371    rs200617908    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0005;RSQ=0.9135;SNPSOURCE=EXOME;THETA=0.0013;VT=SNP

1    866422    rs139210662    C    T    100    PASS    AC=7;AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0032;VT=SNP;AA=c;RSQ=1.0000;THETA=0.0007;ERATE=0.0003;AF=0.0032;AMR_AF=0.01;AFR_AF=0.01

1    866488    rs200139083    G    A    100    PASS    AA=g;AC=0;AF=0;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0000;RSQ=0.0499;SNPSOURCE=EXOME;THETA=0.0004;VT=SNP

 

ADD COMMENTlink written 5.3 years ago by rbagnall1.4k
1
gravatar for Sean Davis
5.3 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

You might take a look at: Getting Allele Frequencies From 1000 Genomes and 1000 Genomes Project SNPs.

ADD COMMENTlink written 5.3 years ago by Sean Davis25k
0
gravatar for ankita
5.3 years ago by
ankita20
India
ankita20 wrote:

Thanks a lot rbagnall, this is exactly what i want... 

ADD COMMENTlink written 5.3 years ago by ankita20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 927 users visited in the last hour