Entering edit mode
                    11.2 years ago
        ankita
        
    
        ▴
    
    20
    How to retrieve frequency data for all exonic variants ( exomes analysed in Phase 1) of 1000 Genomes project?
How to retrieve frequency data for all exonic variants ( exomes analysed in Phase 1) of 1000 Genomes project?
If you download the:
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz
vcf file from the 1000 Genomes ftp site, there is a SNPSOURCE attribute in the vcf file information fields (SNPSOURCE=EXOME, or SNPSOURCE=LOWCOV, or SNPSOURCE=LOWCOV,EXOME etc).
So to get the variants called by exome sequencing you can do:
zless ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz | grep "EXOME"
and this will give the exome variants, with their AC (allele count); AN (allele number etc)
1    69536    rs200013390    C    T    100    PASS    AA=.;AC=0;AF=0;AN=2184;AVGPOST=0.9986;ERATE=0.0006;LDAF=0.0008;RSQ=0.0677;SNPSOURCE=EXOME;THETA=0.0087;VT=SNP
1    861275    rs199884417    C    T    100    PASS    AA=C;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0005;VT=SNP
1    861292    rs191719684    C    G    100    PASS    AVGPOST=1.0000;AA=C;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;THETA=0.0012;LDAF=0.0014;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01
1    861315    rs200140498    G    A    100    PASS    AA=G;AC=2;AF=0.0009;AN=2184;ASN_AF=0.0035;AVGPOST=0.9997;ERATE=0.0003;LDAF=0.0011;RSQ=0.8902;SNPSOURCE=EXOME;THETA=0.0008;VT=SNP
1    865488    rs202189913    A    G    100    PASS    AA=N;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=0.9987;ERATE=0.0005;LDAF=0.0011;RSQ=0.4947;SNPSOURCE=EXOME;THETA=0.0011;VT=SNP
1    865545    rs201186828    G    A    100    PASS    AA=g;AC=4;AF=0.0018;AN=2184;ASN_AF=0.01;AVGPOST=0.9979;ERATE=0.0005;LDAF=0.0025;RSQ=0.6639;SNPSOURCE=EXOME;THETA=0.0009;VT=SNP
1    865584    rs148711625    G    A    100    PASS    RSQ=0.9432;AVGPOST=0.9983;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;AC=26;VT=SNP;LDAF=0.0122;THETA=0.0007;ERATE=0.0003;AF=0.01;AMR_AF=0.0028;AFR_AF=0.05
1    865628    rs41285790    G    A    100    PASS    AC=7;LDAF=0.0033;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0006;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.01;EUR_AF=0.01
1    865662    rs140751899    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9998;ERATE=0.0003;LDAF=0.0005;RSQ=0.8540;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP
1    865664    rs199655347    C    T    100    PASS    AA=c;AC=0;AF=0;AN=2184;AVGPOST=0.9996;ERATE=0.0003;LDAF=0.0002;RSQ=0.0997;SNPSOURCE=EXOME;THETA=0.0028;VT=SNP
1    865694    rs9988179    C    T    100    PASS    AC=136;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9987;LDAF=0.0621;VT=SNP;AA=c;THETA=0.0006;AVGPOST=0.9998;ERATE=0.0003;AF=0.06;ASN_AF=0.16;AMR_AF=0.08;AFR_AF=0.03;EUR_AF=0.0026
1    865700    rs116730894    C    T    100    PASS    AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;AA=c;LDAF=0.0014;THETA=0.0010;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01
1    865705    rs146331776    C    T    100    PASS    RSQ=0.9762;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0018;THETA=0.0005;VT=SNP;AA=c;AC=4;ERATE=0.0003;AVGPOST=0.9999;AF=0.0018;AFR_AF=0.01
1    865734    rs201326364    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP
1    865738    rs139570490    A    G    100    PASS    AC=7;LDAF=0.0033;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0010;AA=a;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.0028;EUR_AF=0.01
1    866371    rs200617908    G    A    100    PASS    AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0005;RSQ=0.9135;SNPSOURCE=EXOME;THETA=0.0013;VT=SNP
1    866422    rs139210662    C    T    100    PASS    AC=7;AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0032;VT=SNP;AA=c;RSQ=1.0000;THETA=0.0007;ERATE=0.0003;AF=0.0032;AMR_AF=0.01;AFR_AF=0.01
1    866488    rs200139083    G    A    100    PASS    AA=g;AC=0;AF=0;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0000;RSQ=0.0499;SNPSOURCE=EXOME;THETA=0.0004;VT=SNP
                    
                
                You might take a look at: Getting Allele Frequencies From 1000 Genomes and 1000 Genomes Project SNPs
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Sean, I have seen one of the links but it will compute frequency for the particular region. I am not able to locate any VCF file where I will get frequency of all exonic variations for studied populations (computed from exome data). If you have any idea about such data, please help.