Simple Retrieval Of 1000 Genomes Maf Using Rsids
2
3
Entering edit mode
12.9 years ago

I have a bunch of rsIDs and I need to find the corresponding 1000 genomes MAF as provided in dbSNP: I have a set of IDs as input:

rs1751034 rs1799852 rs1799983 rs1800460 rs1801030

I am looking for server / service / database that can give me an output like

rs1751034 NA
rs1799852 0.132
rs1799983 0.203
rs1800460 0.020
rs1801030 0.090

Google searches revealed several raw data-munging options:

I have tried several options including the following two suggestions: http://www.ncbi.nlm.nih.gov/books/NBK44431/#Search.Finding_Minor_Allele_Frequencies I downloaded the AlleleFreqBySsPop.bcp, but am not sure if this table provides the 1000 genomes MAF.

Also tried this option: http://seqanswers.com/forums/showthread.php?t=4910 Downloaded and checked for MAFs, but the allele frequencies are not concordant with the 1000 genomes MAF in dbSNP.

I just need this info only for a couple of SNPs, so looking for a simple search/retrieve option. Do you know how can I get this information ?

Thanks in advance !

dbsnp maf genome allele • 9.0k views
ADD COMMENT
7
Entering edit mode
12.9 years ago
Stephen 2.8k

I had to do this exact same thing a few weeks ago. I was looking for a database that already had precalculated values but I ended up using tabix and vcftools to do it myself. Here's a tutorial I wrote on how to do this.

I also posted a similar question here, and got some answers involving using the Ensembl Perl API and BioMart, but I found calculating the MAFs myself to be much easier.

ADD COMMENT
2
Entering edit mode

Something it is worth noting is while AF based on AC and AN are useful its better to use the ones the project provides where possible as these will of used additional haplotype and LD info to calculate the AF so will give better estimates for low frequency snps. We provide the files with AF based on just either the ASN, AFR or EUR individuals in the supporting directory ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/

ADD REPLY
2
Entering edit mode

VCF spec (google vcftools).

ADD REPLY
1
Entering edit mode

Thanks Laura. Can you please tell me what are the AC and AN you are referring here ?

ADD REPLY
0
Entering edit mode

I guess your question differs slightly and you need to pull specific positions out of 1000 genomes. Perhaps a perl wrapper around tabix?

ADD REPLY
0
Entering edit mode

Thanks Stephen, that's a perfect solution for me. MAFs are concordant with dbSNP and am also getting allele frequencies different alleles.

ADD REPLY
2
Entering edit mode
12.9 years ago

quick answer:

[?]

you can then use for example a unix join (or a database...) to query those data.

ADD COMMENT
2
Entering edit mode

Using perl for the last command line is more convenient: perl -ne 'print "$1n" if /AF=([^;t]+)/'. Also use zcat to work with compressed files directly.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks Pierre, Heng !

ADD REPLY
0
Entering edit mode

Hi, Pierre! I just want do the same thing as this question. So I have download the .vcf file as you said. However, I search the rs1799852 in the vcf file, and the AF=0.119, not as same as the corresponding 1000 genomes MAF in dbSNP, which is 0.132. Then I search more SNPs, but I still get inconsistent results. Could you tell me the reason? Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6