Parsing refSNP data in R
1
0
Entering edit mode
6.4 years ago
ceruleanivy ▴ 50

I have a single column data frame in R with nearly 143 dbsnp listings (in the form of rs*) and I want to fill in more columns with data such as the ancestral allele, MAF etc. What is the quickest way to do that ?

SNP R snp sequence genome • 1.2k views
1
Entering edit mode
6.4 years ago
Ram 37k

I'd use reutils. It's a wrapper in R for eutils. eutils can get you all the information you need, and you can pass a comma separated list of input IDs for a command to process and give you output in one chunk.

For example, to get data on rs869312219 and rs869312218, my eutils command would be: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=rs869312219,rs869312218

You can check out this tutorial to help you transform eutils REST URLs to reutils commands: https://github.com/gschofl/reutils

0
Entering edit mode

I still can't find how to determine the ancestral allele although I can get all the rest of the information which is perfect. Try for example "efetch("422628", "snp", "docset")" for rs422628. I wish there was a way to index "T", which is the ancestral allele in the output file.

0
Entering edit mode

You'll have to drill down to the ancestralAllele attribute in the Sequence tag (xpath ExchangeSet/Rs/Sequence/@ancestralAllele). This will be available only for entries with an ancestral allele listed.

Add a retMode=xml option to the query so you can traverse using xpath.

EDIT: I did this using the convoluted

library('reutils');
library('XML');
dbsnp_entry<-efetch('rs422628',db='snp',retmode='xml');
entry_as_parsed_xml<-xmlTreeParse(content(dbsnp_entry,as='text'),useInternal=T);
entry_as_list<-xmlToList(entry_as_parsed_xml);
(ancestralAllele<-entry_as_list$Rs$Sequence\$.attrs['ancestralAllele'])