Question: Simple Retrieval Of 1000 Genomes Maf Using Rsids
2
gravatar for Khader Shameer
7.8 years ago by
Manhattan, NY
Khader Shameer17k wrote:

I have a bunch of rsIDs and I need to find the corresponding 1000 genomes MAF as provided in dbSNP: I have a set of IDs as input:

rs1751034 rs1799852 rs1799983 rs1800460 rs1801030

I am looking for server / service / database that can give me an output like

rs1751034 NA
rs1799852 0.132
rs1799983 0.203
rs1800460 0.020
rs1801030 0.090

Google searches revealed several raw data-munging options:

I have tried several options including the following two suggestions: http://www.ncbi.nlm.nih.gov/books/NBK44431/#Search.Finding_Minor_Allele_Frequencies I downloaded the AlleleFreqBySsPop.bcp, but am not sure if this table provides the 1000 genomes MAF.

Also tried this option: http://seqanswers.com/forums/showthread.php?t=4910 Downloaded and checked for MAFs, but the allele frequencies are not concordant with the 1000 genomes MAF in dbSNP.

I just need this info only for a couple of SNPs, so looking for a simple search/retrieve option. Do you know how can I get this information ?

Thanks in advance !

maf genome allele dbsnp • 5.9k views
ADD COMMENTlink modified 5.1 years ago by Biostar ♦♦ 20 • written 7.8 years ago by Khader Shameer17k
7
gravatar for Stephen
7.8 years ago by
Stephen2.7k
Charlottesville Virginia
Stephen2.7k wrote:

I had to do this exact same thing a few weeks ago. I was looking for a database that already had precalculated values but I ended up using tabix and vcftools to do it myself. Here's a tutorial I wrote on how to do this.

I also posted a similar question here, and got some answers involving using the Ensembl Perl API and BioMart, but I found calculating the MAFs myself to be much easier.

ADD COMMENTlink written 7.8 years ago by Stephen2.7k
2

Something it is worth noting is while AF based on AC and AN are useful its better to use the ones the project provides where possible as these will of used additional haplotype and LD info to calculate the AF so will give better estimates for low frequency snps. We provide the files with AF based on just either the ASN, AFR or EUR individuals in the supporting directory ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/

ADD REPLYlink written 7.8 years ago by Laura1.7k
2

VCF spec (google vcftools).

ADD REPLYlink written 7.7 years ago by lh331k
1

Thanks Laura. Can you please tell me what are the AC and AN you are referring here ?

ADD REPLYlink written 7.8 years ago by Khader Shameer17k

I guess your question differs slightly and you need to pull specific positions out of 1000 genomes. Perhaps a perl wrapper around tabix?

ADD REPLYlink written 7.8 years ago by Stephen2.7k

Thanks Stephen, that's a perfect solution for me. MAFs are concordant with dbSNP and am also getting allele frequencies different alleles.

ADD REPLYlink written 7.8 years ago by Khader Shameer17k
1
gravatar for Pierre Lindenbaum
7.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

quick answer:

  • download and gunzip ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20100804/ALL.2of4intersection.20100804.sites.vcf.gz
  • extract the rsID & MAFs:

[?]

you can then use for example a unix join (or a database...) to query those data.

ADD COMMENTlink written 7.8 years ago by Pierre Lindenbaum117k
2

Using perl for the last command line is more convenient: perl -ne 'print "$1n" if /AF=([^;t]+)/'. Also use zcat to work with compressed files directly.

ADD REPLYlink written 7.8 years ago by lh331k

I'm not sure zcat would work. see http://biostar.stackexchange.com/questions/6112

ADD REPLYlink written 7.8 years ago by Pierre Lindenbaum117k

Thanks Pierre, Heng !

ADD REPLYlink written 7.8 years ago by Khader Shameer17k

Hi, Pierre! I just want do the same thing as this question. So I have download the .vcf file as you said. However, I search the rs1799852 in the vcf file, and the AF=0.119, not as same as the corresponding 1000 genomes MAF in dbSNP, which is 0.132. Then I search more SNPs, but I still get inconsistent results. Could you tell me the reason? Thank you!

ADD REPLYlink written 7.7 years ago by Cathy0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour