Question

How to get descriptions for rs* SNP identifiers?

0

Entering edit mode

6.1 years ago

O.rka ▴ 710

I have a very large *.bed file with 15862212 lines from a whole genome VCF. I annotated the VCF for SNPs and now have a file with a preview below using the protocol in C: How to get SNP identifiers from VCF file? . How can I get the descriptors for these rs* IDs? My main goal is to figure out which blood-type I have from this information.

-bash-4.1$ zcat genome.vcf.hg38.snp147.bed.gz | head -n 10
chr1    10019   10020   rs775809821
chr1    10055   10056   rs768019142
chr1    10107   10108   rs62651026  .
chr1    10108   10109   rs376007522 .
chr1    10128   10129   rs796688738
chr1    10138   10139   rs368469931
chr1    10144   10145   rs144773400
chr1    10146   10147   rs779258992
chr1    10149   10150   rs371194064
chr1    10165   10166   rs796884232

SNP genome vcf annotation • 1.7k views

ADD COMMENT • link updated 6.1 years ago by jared.andrews07 ★ 16k • written 6.1 years ago by O.rka ▴ 710

0

Entering edit mode

what do you mean with "descriptors" ?

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I guess there isn't a descriptor for each snp but to find metadata associated with snps such as: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=144773400

ADD REPLY • link 6.1 years ago by O.rka ▴ 710

0

Entering edit mode

SNP annotations file: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp142Common.txt.gz

ADD REPLY • link 6.1 years ago by Joe ▴ 40

score 0 · Answer 1 · 2018-02-23

You can use the ENSEMBL RESTful API in basically any language you'd want. It will return basic info and population frequencies. The Variant Effect Predictor may also be worth your time. I don't think these services will really help though, since blood type is mostly determined by only a handful of variants.

See this page for a lot more info and different sets of variants that may be helpful for you.