Question: Getting The Alleles Of Specific Snps For 37:Grch37
5
gravatar for Emma
8.6 years ago by
Emma140
Emma140 wrote:

Hi all,

I am trying to get the alleles and frequences of some SNPs (from across the genome) for the assembly 37:GRCh37 (positive strand). I thought the easiest way would be to download the frequency data from hapmap and then look for my SNPs, but they are only have data for up to build 36. I also tried to send a batch query at the ncbi data but they dont support files as large as mine(I have to cut it in chunks) and they return far too much information than I need (genotypes for all submitted data, all existing populations, etc). Im only interested in CEU and the frequences from HapMap are more than good enough for my purposes. Im thinking there must be an easier way to do it than the batch query. All ideas are welcome!

Thanks!

genome allele snp • 2.6k views
ADD COMMENTlink modified 8.0 years ago by Pierre Lindenbaum118k • written 8.6 years ago by Emma140
4
gravatar for Khader Shameer
8.6 years ago by
Manhattan, NY
Khader Shameer18k wrote:

You can try your search with BioMart, HapMart - BioMart based interface for data mining targeted at HapMap data. If you are new to BioMart you may start with this article and variety of documents to get started with BioMart including video tutorials are available here.

ADD COMMENTlink modified 8.6 years ago • written 8.6 years ago by Khader Shameer18k
2

Emma, Biomart has all the data that you need (i.e. SNP information mapped to GRCh37), plus an archive of past mappings. you may have incorrectly landed on one of these, but if you go to http://www.biomart.org/, select MartView, choose database "Ensembl Variation 59", and choose dataset "Homo Sapiens Variation (dbSNP131)" you will surely be working with up to date information.

ADD REPLYlink written 8.5 years ago by Jorge Amigo11k

Thanks Khader for the Biomart intro.

ADD REPLYlink written 8.6 years ago by jvijai1.1k

Thanks, this is a good link to keep in mind for future use. But for now Im afraid it has similar problems as downloading directly from the hapmap ftp, ie it only has release 27 data, not the build that I need.

ADD REPLYlink written 8.5 years ago by Emma140

What Jorge said !

ADD REPLYlink written 8.5 years ago by Khader Shameer18k
2
gravatar for Pierre Lindenbaum
8.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

you can cross the mysql data of hapmap CEU of the UCSC for hg18 and the positions of the SNP for hg19(build37) dbsnp131:

mysql -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg19

> select S.* from hg18.hapmapSnpsCEU as H, hg19.snp131 as S
  where S.name=H.name limit 2;
*************************** 1. row ***************************
       bin: 1289
     chrom: chr7
chromStart: 92383887
  chromEnd: 92383888
      name: rs10
     score: 0
    strand: +
   refNCBI: A
   refUCSC: A
  observed: A/C
   molType: genomic
     class: single
     valid: by-cluster,by-frequency,by-submitter,by-hapmap,by-1000genomes
     avHet: 0.028124
   avHetSE: 0.115199
      func: intron
   locType: exact
    weight: 1
*************************** 2. row ***************************
       bin: 1553
     chrom: chr12
chromStart: 126890979
  chromEnd: 126890980
      name: rs1000000
     score: 0
    strand: -
   refNCBI: G
   refUCSC: G
  observed: C/T
   molType: genomic
     class: single
     valid: by-cluster,by-frequency,by-2hit-2allele,by-hapmap,by-1000genomes
     avHet: 0.308102
   avHetSE: 0.243155
      func: unknown
   locType: exact
    weight: 1
ADD COMMENTlink written 8.6 years ago by Pierre Lindenbaum118k
1

Emma, if you had a local installation of the UCSC databases, the best way would be to load your rs## in a 3rd database and to join it with the others. With the following SQL query you can store the results in a file, sort the file on the rs name , sort your rs list and join the two files with unix-join http://en.wikipedia.org/wiki/Join_%28Unix%29

ADD REPLYlink written 8.5 years ago by Pierre Lindenbaum118k

I havent used mysql before so my question is probably naive. I have around 40,000 SNPs that I need the strand, observed alleles and frequences for. Can I upload/input the rs# that I need and output to a text? Thanks for the idea, looks like it's probably the way to go.

ADD REPLYlink written 8.5 years ago by Emma140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1167 users visited in the last hour