Parsing refSNP data in R
1
0
Entering edit mode
7.9 years ago
ceruleanivy ▴ 50

I have a single column data frame in R with nearly 143 dbsnp listings (in the form of rs*) and I want to fill in more columns with data such as the ancestral allele, MAF etc. What is the quickest way to do that ?

SNP R snp sequence genome • 1.4k views
ADD COMMENT
1
Entering edit mode
7.9 years ago
Ram 43k

I'd use reutils. It's a wrapper in R for eutils. eutils can get you all the information you need, and you can pass a comma separated list of input IDs for a command to process and give you output in one chunk.

For example, to get data on rs869312219 and rs869312218, my eutils command would be: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=rs869312219,rs869312218

You can check out this tutorial to help you transform eutils REST URLs to reutils commands: https://github.com/gschofl/reutils

ADD COMMENT
0
Entering edit mode

I still can't find how to determine the ancestral allele although I can get all the rest of the information which is perfect. Try for example "efetch("422628", "snp", "docset")" for rs422628. I wish there was a way to index "T", which is the ancestral allele in the output file.

ADD REPLY
0
Entering edit mode

You'll have to drill down to the ancestralAllele attribute in the Sequence tag (xpath ExchangeSet/Rs/Sequence/@ancestralAllele). This will be available only for entries with an ancestral allele listed.

Add a retMode=xml option to the query so you can traverse using xpath.

EDIT: I did this using the convoluted

library('reutils');
library('XML');
dbsnp_entry<-efetch('rs422628',db='snp',retmode='xml');
entry_as_parsed_xml<-xmlTreeParse(content(dbsnp_entry,as='text'),useInternal=T);
entry_as_list<-xmlToList(entry_as_parsed_xml);
(ancestralAllele<-entry_as_list$Rs$Sequence$.attrs['ancestralAllele'])
ADD REPLY

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6