How to get descriptions for rs* SNP identifiers?
1
0
Entering edit mode
6.1 years ago
O.rka ▴ 710

I have a very large *.bed file with 15862212 lines from a whole genome VCF. I annotated the VCF for SNPs and now have a file with a preview below using the protocol in C: How to get SNP identifiers from VCF file? . How can I get the descriptors for these rs* IDs? My main goal is to figure out which blood-type I have from this information.

-bash-4.1$ zcat genome.vcf.hg38.snp147.bed.gz | head -n 10
chr1    10019   10020   rs775809821
chr1    10055   10056   rs768019142
chr1    10107   10108   rs62651026  .
chr1    10108   10109   rs376007522 .
chr1    10128   10129   rs796688738
chr1    10138   10139   rs368469931
chr1    10144   10145   rs144773400
chr1    10146   10147   rs779258992
chr1    10149   10150   rs371194064
chr1    10165   10166   rs796884232
SNP genome vcf annotation • 1.7k views
ADD COMMENT
0
Entering edit mode

what do you mean with "descriptors" ?

ADD REPLY
0
Entering edit mode

I guess there isn't a descriptor for each snp but to find metadata associated with snps such as: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=144773400

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
6.1 years ago

You can use the ENSEMBL RESTful API in basically any language you'd want. It will return basic info and population frequencies. The Variant Effect Predictor may also be worth your time. I don't think these services will really help though, since blood type is mostly determined by only a handful of variants.

See this page for a lot more info and different sets of variants that may be helpful for you.

ADD COMMENT

Login before adding your answer.

Traffic: 3032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6