How Within R, Using Xpath And Xml Package, Can I Select Nodes (Getnodeset) Based On Their Value?
3
0
Entering edit mode
12.2 years ago
User56 ▴ 100

This is a follow up on this question http://biostar.stackexchange.com/questions/17333/is-there-an-r-library-similar-to-libraries-like-bioperl-biopython-or-bioruby-m

This is a problem in R using XML package. I have 2 pubmed articles and I need to select only certain IDS. Only from certain databases I can not crack how to specify search by element value using XPath in R.

Here is my code:

#this PMID has has GOE IDs
url1="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21558518&retmode=xml"
#this PMID has has Clnical Trials

 url2="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21830967&retmode=xml"
 xml1 = xmlTreeParse(url1,useInternal = T)
 xml2 = xmlTreeParse(url2,useInternal = T)
 ns1 <- getNodeSet(xml1, '//DataBank/DataBankName')  
 ns2 <- getNodeSet(xml2, '//DataBank/DataBankName')
 ns1
 ns2

I need to modify the XPath to only select where DataBankName is (='ClinicalTrials.gov' or ='ISRCTN') URL which shows ISRCNT is this one

 url3="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21675889&retmode=xml"

I need the IDs from the element stored in accession list:

(ns <- getNodeSet(xml1, '//DataBank'))

It looks like this:

<DataBank>
  <DataBankName>GEO</DataBankName>
  <AccessionNumberList>
    <AccessionNumber>GSE25055</AccessionNumber>
    <AccessionNumber>GSE25065</AccessionNumber>
    <AccessionNumber>GSE25066</AccessionNumber>
  </AccessionNumberList>
</DataBank>

I tried several ways how to match XPath based an element value but could not solve it. (any other solution, bypassing XPath is fine too)

Here is what I need (but it gives me error)

ns <- getNodeSet(xml1, '//DataBank/DataBankName[text()="ClinicalTrials.gov" or text()="ISRCTN"]/../AccessionNumberList/AccessionNumber')
r xml pubmed • 24k views
ADD COMMENT
2
Entering edit mode
12.2 years ago
Chris Maloney ▴ 360

I don't have R, so I can't try this, but this might also work (simplified slightly from your example):

ns <- getNodeSet(xml1, 
  '//DataBank[DataBankName="ClinicalTrials.gov" or 
              DataBankName="ISRCTN"]
   /AccessionNumberList/AccessionNumber')
ADD COMMENT
0
Entering edit mode

Yes, tried it: returns an XMLNodeSet with the 2 accession numbers (from xml3).

ADD REPLY
0
Entering edit mode

Thanks.Yes. that is smart. does not require backtracking to the parent. I did not see that in any XPath examples on the net.

ADD REPLY
1
Entering edit mode
12.2 years ago
Neilfws 49k

Would you consider a non-XPath solution?

The XML package has a couple of useful functions; xmlToList() and xmlToDataFrame(). These can convert the XML to native R data structures, which can be easier to work with within R.

Something like this code - which also uses llply from the plyr package to put the accession numbers into a new list.

library(XML)
library(plyr)
url3 <- http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21675889&retmode=xml"
xml3 <- xmlTreeParse(url3, useInternal = T)
# convert to list
l <- xmlToList(xml3)

# should really check for existence of DataBankName
# but we'll leave that for now

if(l$PubmedArticle$MedlineCitation$Article$DataBankList$DataBank$DataBankName == "ISRCTN") {
  accn <- llply(l$PubmedArticle$MedlineCitation$Article$DataBankList$DataBank$AccessionNumberList)
}

print(accn)
# $AccessionNumber
# [1] "ISRCTN78147026"

# $AccessionNumber
# [1] "ISRCTN87739946"

It looks unwieldy, but the "$" notation for accessing list elements is helpful, once you see how the XML maps to the list.

ADD COMMENT
0
Entering edit mode

Thanks for pointing those functions. Any solution is fine.

ADD REPLY
1
Entering edit mode
12.2 years ago
User56 ▴ 100

There was a typo in my code from the question - extra " and also plural for trial(s).

The last piece of code actually works (repeated here)

(ns <- getNodeSet(xml3, '//DataBank/DataBankName[text() = "ClinicalTrials.gov" or text() = "ISRCTN"]/../AccessionNumberList/AccessionNumber'))
ADD COMMENT

Login before adding your answer.

Traffic: 3170 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6