Question

Obtain chromosome, position, and alleles based on a list of SNP names

2

Entering edit mode

6.0 years ago

jiumeng66 ▴ 40

I have a SNP list (including 3660 SNPs), which have only the name, like rs41457244.

Now I need the other information of the SNPs, such as chromosomes, positions (hg19), and alleles. What can I do?

The following is part of my SNP list:

SNP • 5.3k views

ADD COMMENT • link updated 6.0 years ago by igor 13k • written 6.0 years ago by jiumeng66 ▴ 40

1

Entering edit mode

Try rsnps package in R

https://cran.r-project.org/web/packages/rsnps/rsnps.pdf

ADD REPLY • link 6.0 years ago by Bioinformatics_NewComer ▴ 330

0

Entering edit mode

Please add an example/more detail or move your suggestion to a comment. Thank you!

ADD REPLY • link 6.0 years ago by Ram 43k

0

Entering edit mode

Added link for the package tutorial.

ADD REPLY • link 6.0 years ago by Bioinformatics_NewComer ▴ 330

0

Entering edit mode

IMO it still needs work to qualify as an answer. The package name and package manual link are effectively just a suggestion now. I'm moving this to a comment.

ADD REPLY • link 6.0 years ago by Ram 43k

score 4 · Answer 1 · 2018-04-04

$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -P 3306 -D hg38 -e 'select chrom,chromStart,chromEnd,name,alleles,alleleFreqs from snp150 where name in ("rs2088175","rs2983855","rs2821958","rs41469446","rs619987","rs2877425","rs41447048","rs41497748","rs503808","rs386628","rs6667995","rs41405345")'
+---------------------+------------+-----------+------------+---------+--------------------+
| chrom               | chromStart | chromEnd  | name       | alleles | alleleFreqs        |
+---------------------+------------+-----------+------------+---------+--------------------+
| chr3                |   76216220 |  76216221 | rs2088175  | C,G,    | 0.875399,0.124601, |
| chr21               |   13597605 |  13597606 | rs2821958  | A,G,    | 0.884181,0.115819, |
| chr4                |   68990362 |  68990363 | rs2877425  | A,G,    | 0.085463,0.914537, |
| chr4_GL000257v2_alt |     566381 |    566382 | rs2877425  | A,G,    | 0.085463,0.914537, |
| chr10               |   38291051 |  38291052 | rs2983855  | C,T,    | 0.468850,0.531150, |
| chr6                |   28816618 |  28816619 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr6_GL000250v2_alt |      82197 |     82198 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr6_GL000251v2_alt |     307169 |    307170 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr6_GL000252v2_alt |      82220 |     82221 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr6_GL000253v2_alt |      82185 |     82186 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr6_GL000254v2_alt |      82211 |     82212 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr6_GL000255v2_alt |      82201 |     82202 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr6_GL000256v2_alt |     125864 |    125865 | rs386628   | C,T,    | 0.339257,0.660743, |
| chr17               |   22206798 |  22206799 | rs41447048 | C,T,    | 0.833067,0.166933, |
| chr18               |   14208563 |  14208564 | rs41469446 | A,G,    | 0.868011,0.131989, |
| chr1                |   83314732 |  83314733 | rs41497748 | A,T,    | 0.831669,0.168331, |
| chr2                |  126689276 | 126689277 | rs503808   | A,G,    | 0.647364,0.352636, |
| chr1                |  149015643 | 149015644 | rs619987   | A,T,    | 0.869010,0.130990, |
| chr1                |  205854053 | 205854054 | rs6667995  | A,C,    | 0.758586,0.241414, |
+---------------------+------------+-----------+------------+---------+--------------------+

score 2 · Answer 2 · 2018-04-04

2

Entering edit mode

6.0 years ago

Emily 23k

BioMart

Use the short variation database, filter by your list of IDs, get location and alleles as attributes.

Example query

ADD COMMENT • link 6.0 years ago by Emily 23k

GenoMax · Answer 3 · 2018-04-04

There are a lot of excellent responses already, but I wanted to offer an alternate solution.

You can also download SNPs in a table format from UCSC: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp150.txt.gz

Since these are in a table format, they are a little easier to read and process without specialized tools. For example, you can run:

zcat snp150.txt.gz | grep -w -f snplist.txt

Where snplist.txt is your list of SNPs.

score 0 · Answer 4 · 2018-04-04

0

Entering edit mode

6.0 years ago

Bastien Hervé 5.3k

If you download the dbSNP of your genome, you will have all the informations you need. Just look into the dbSNP file with your SNPs list using Awk, Python, Perl...

ADD COMMENT • link 6.0 years ago by Bastien Hervé 5.3k

0

Entering edit mode

Thank you, but where can I download the dbSNP?

ADD REPLY • link 6.0 years ago by jiumeng66 ▴ 40

0

Entering edit mode

hg19 :

In your browser

ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.vcf.gz

or in your terminal

wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.vcf.gz

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.3k

0

Entering edit mode

sorry to reply lately, I have download it just now with your help. Thank you very much.

ADD REPLY • link 6.0 years ago by jiumeng66 ▴ 40

score 0 · Answer 5 · 2018-04-04

If you want to do things locally, you can make a searchable resource you can query as you like.

1) Get SNPs and write them into a text file sorted by SNP ID.

For hg19, for instance, using BEDOPS convert2bed to convert VCF to BED:

$ LC_ALL=C
$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/All_20170710.vcf.gz \
   | gunzip -c \
   | convert2bed --input=vcf --sort-tmpdir=${PWD} - \
   | awk -v OFS="\t" '{ print $4,"chr"$1,$2,$3,$6,$7; }' \
   | sort -k1,1 \
   > hg19.snp150.sortedByName.txt

This text file includes the SNP rsID, the genomic position, and the reference and alternate alleles. It is sorted by the SNP rsID, a property which we will use to enable fast searches.

2) Install pts-line-bisect, which does a binary search on lexicographically-sorted files, such as the one that was just made:

$ git clone https://github.com/pts/pts-line-bisect.git
$ cd pts-line-bisect && make && cd ..

Binary searches are pretty fast and great for write-once, read-many applications like this.

3) Run a query. The following command would return a six-column BED file:

$ rs_of_interest=rs41457244
$ ./pts-line-bisect/pts_lbsearch -p hg19.snp150.sortedByName.txt ${rs_of_interest} \
   | head -1 \
   | awk -v OFS="\t" '{ print "chr"$2,$3,$4,$1,$5,$6; }'

Step 3 can be put into a script so that you can re-run it with your SNP of interest on demand.

For instance:

#!/bin/bash
pts_lbsearch_bin=./pts-line-bisect/pts_lbsearch
sorted_snp_txt=hg19.snp150.sortedByName.txt
${pts_lbsearch_bin} -p ${sorted_snp_txt} $1 | head -1 | awk -v OFS="\t" '{ print "chr"$2,$3,$4,$1,$5,$6; }'

Then:

$ ./search_snps.sh rs41457244
...

If you have a file of rs* IDs, you could loop through them via bash:

$ while read rsID; do ./search_snps.sh $rsID; done < snpIDs.txt | sort-bed - > snps.bed

Writing a sorted BED file can be useful for enabling set operations with BEDOPS and other toolkits.