Question: Query For A List Of Snp Alleles, Frequency Ceu, Strand
gravatar for jvijai
9.6 years ago by
United States
jvijai1.2k wrote:


Say, I have a list of 1 Million SNPs from one of the common arrays, and I want to search for data pertaining to Hapmap CEU; information such as alleles, Minor allele frequency, strand, etc., how can I do this.
UCSC tables does not give me allele frequency for dbSNP130.
Does Biomart have a limit of # of SNPs that can be queried at one time.?

Thank you

frequency dbsnp • 4.2k views
ADD COMMENTlink written 9.6 years ago by jvijai1.2k
gravatar for Pierre Lindenbaum
9.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:

You'll find all those data in the sub-directories of the Hapmap FTP site:

ADD COMMENTlink written 9.6 years ago by Pierre Lindenbaum130k
gravatar for Jorge Amigo
9.6 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

although bulk downloading the data from the HapMap ftp site and process it yourself as Pierre suggests would be the most appropriate thing to do (I suggest you too to go to the latest ftp relase folder currently 2010-08_phaseII+III, download all the chromosomes data, and parse all that files looking for the SNPs of your interest), I understand that you may think "why should I deal with all that bunch of files if they have already done so?". if that is the case and you want to use the BioMart retrieval tool on top of HapMap you can also obtain the data you are interested in from there.

I have tried in the past the capabilities of such retrieval tool, and I didn't find any limitations in terms of query size. I'm sure it should be capable of letting you download all the data you are interested in by uploading a bunch of rs numbers on a single file as the only filter, and selecting the attributes you need (frequency, MAF, strand, ...). note that the HapMap version this tool handles is the #27 release, and not the current #28 release, so go ahead if you can live with that. if not, you will need to consider the original bulk parsing suggestion.

ADD COMMENTlink written 9.6 years ago by Jorge Amigo12k

Thank you Jorge, your reply was most useful.

ADD REPLYlink written 9.6 years ago by jvijai1.2k
gravatar for lh3
9.6 years ago by
United States
lh332k wrote:

With the release of 1000g data, which is far more complete than HapMap, the best way is to always look at the latest build (currently 132) of dbSNP. There are also other improvements in the latest build as I remember. I recommend VCF format:

Other formats are also available if you prefer.

ADD COMMENTlink written 9.6 years ago by lh332k

Thank you Heng Li. Perhaps slightly offtopic, but if one wants to filter known variants from a novel variant disease discovery project...say exome sequencing, would it be by using the combined VCF file from this page?

ADD REPLYlink written 9.6 years ago by jvijai1.2k

For this purpose, dbSNP is definitely more appropriate than HapMap which does not cover all the common SNPs. Nonetheless, you may want to set a threshold to filter out SNPs with very low frequency.

ADD REPLYlink written 9.6 years ago by lh332k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1101 users visited in the last hour