Entering edit mode
9.6 years ago
ankita
▴
20
How to retrieve frequency data for all exonic variants ( exomes analysed in Phase 1) of 1000 Genomes project?
How to retrieve frequency data for all exonic variants ( exomes analysed in Phase 1) of 1000 Genomes project?
If you download the:
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz
vcf file from the 1000 Genomes ftp site, there is a SNPSOURCE attribute in the vcf file information fields (SNPSOURCE=EXOME, or SNPSOURCE=LOWCOV, or SNPSOURCE=LOWCOV,EXOME etc).
So to get the variants called by exome sequencing you can do:
zless ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz | grep "EXOME"
and this will give the exome variants, with their AC (allele count); AN (allele number etc)
1 69536 rs200013390 C T 100 PASS AA=.;AC=0;AF=0;AN=2184;AVGPOST=0.9986;ERATE=0.0006;LDAF=0.0008;RSQ=0.0677;SNPSOURCE=EXOME;THETA=0.0087;VT=SNP
1 861275 rs199884417 C T 100 PASS AA=C;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0005;VT=SNP
1 861292 rs191719684 C G 100 PASS AVGPOST=1.0000;AA=C;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;THETA=0.0012;LDAF=0.0014;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01
1 861315 rs200140498 G A 100 PASS AA=G;AC=2;AF=0.0009;AN=2184;ASN_AF=0.0035;AVGPOST=0.9997;ERATE=0.0003;LDAF=0.0011;RSQ=0.8902;SNPSOURCE=EXOME;THETA=0.0008;VT=SNP
1 865488 rs202189913 A G 100 PASS AA=N;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=0.9987;ERATE=0.0005;LDAF=0.0011;RSQ=0.4947;SNPSOURCE=EXOME;THETA=0.0011;VT=SNP
1 865545 rs201186828 G A 100 PASS AA=g;AC=4;AF=0.0018;AN=2184;ASN_AF=0.01;AVGPOST=0.9979;ERATE=0.0005;LDAF=0.0025;RSQ=0.6639;SNPSOURCE=EXOME;THETA=0.0009;VT=SNP
1 865584 rs148711625 G A 100 PASS RSQ=0.9432;AVGPOST=0.9983;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;AC=26;VT=SNP;LDAF=0.0122;THETA=0.0007;ERATE=0.0003;AF=0.01;AMR_AF=0.0028;AFR_AF=0.05
1 865628 rs41285790 G A 100 PASS AC=7;LDAF=0.0033;AA=g;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0006;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.01;EUR_AF=0.01
1 865662 rs140751899 G A 100 PASS AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9998;ERATE=0.0003;LDAF=0.0005;RSQ=0.8540;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP
1 865664 rs199655347 C T 100 PASS AA=c;AC=0;AF=0;AN=2184;AVGPOST=0.9996;ERATE=0.0003;LDAF=0.0002;RSQ=0.0997;SNPSOURCE=EXOME;THETA=0.0028;VT=SNP
1 865694 rs9988179 C T 100 PASS AC=136;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9987;LDAF=0.0621;VT=SNP;AA=c;THETA=0.0006;AVGPOST=0.9998;ERATE=0.0003;AF=0.06;ASN_AF=0.16;AMR_AF=0.08;AFR_AF=0.03;EUR_AF=0.0026
1 865700 rs116730894 C T 100 PASS AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9844;VT=SNP;AA=c;LDAF=0.0014;THETA=0.0010;ERATE=0.0003;AC=3;AF=0.0014;AFR_AF=0.01
1 865705 rs146331776 C T 100 PASS RSQ=0.9762;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0018;THETA=0.0005;VT=SNP;AA=c;AC=4;ERATE=0.0003;AVGPOST=0.9999;AF=0.0018;AFR_AF=0.01
1 865734 rs201326364 G A 100 PASS AA=g;AC=1;AF=0.0005;AN=2184;ASN_AF=0.0017;AVGPOST=1.0000;ERATE=0.0003;LDAF=0.0005;RSQ=1.0000;SNPSOURCE=EXOME;THETA=0.0017;VT=SNP
1 865738 rs139570490 A G 100 PASS AC=7;LDAF=0.0033;SNPSOURCE=LOWCOV,EXOME;AN=2184;RSQ=0.9799;VT=SNP;THETA=0.0010;AA=a;ERATE=0.0003;AVGPOST=0.9999;AF=0.0032;AMR_AF=0.0028;EUR_AF=0.01
1 866371 rs200617908 G A 100 PASS AA=g;AC=1;AF=0.0005;AFR_AF=0.0020;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0005;RSQ=0.9135;SNPSOURCE=EXOME;THETA=0.0013;VT=SNP
1 866422 rs139210662 C T 100 PASS AC=7;AVGPOST=1.0000;SNPSOURCE=LOWCOV,EXOME;AN=2184;LDAF=0.0032;VT=SNP;AA=c;RSQ=1.0000;THETA=0.0007;ERATE=0.0003;AF=0.0032;AMR_AF=0.01;AFR_AF=0.01
1 866488 rs200139083 G A 100 PASS AA=g;AC=0;AF=0;AN=2184;AVGPOST=0.9999;ERATE=0.0003;LDAF=0.0000;RSQ=0.0499;SNPSOURCE=EXOME;THETA=0.0004;VT=SNP
You might take a look at: Getting Allele Frequencies From 1000 Genomes and 1000 Genomes Project SNPs
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Sean, I have seen one of the links but it will compute frequency for the particular region. I am not able to locate any VCF file where I will get frequency of all exonic variations for studied populations (computed from exome data). If you have any idea about such data, please help.