How Does One Download An Xml Formatted List Of Cited Article From Pubmed?
2
1
Entering edit mode
11.7 years ago
Burke ▴ 290

I would like to analyze some metadata about a publication and I have a perl script that parses PubMed XML formatted files. However, I do not see a way to download the "cited by" list as XML. Is there a way to do this?

xml pubmed • 5.1k views
ADD COMMENT
4
Entering edit mode
11.7 years ago

Use NCBI-ELink. For example for pmid:19755503 in http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmode=xml&dbfrom=pubmed&id=19755503&cmd=neighbor , see the node 'cited_in'

        </Link>
        <Link>
            <Id>21591145</Id>
        </Link>
    </LinkSetDb>
    <LinkSetDb>
        <DbTo>pubmed</DbTo>
        <LinkName>pubmed_pubmed_citedin</LinkName>
        <Link>
            <Id>22644393</Id>
        </Link>
        <Link>
            <Id>22587672</Id>
        </Link>
        <Link>
            <Id>22541597</Id>
        </Link>
        <Link>
            <Id>22438567</Id>
        </Link>
        <Link>
            <Id>22434829</Id>
        </Link>
        <Link>

EDIT: the following xslt stylesheet will download and merge all the pubmed XML records:


<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >
<xsl:output method="xml" encoding="UTF-8"/>

<xsl:template match="/">
<MERGED>
    <xsl:for-each select="//LinkSetDb[LinkName='pubmed_pubmed_citedin']/Link/Id">
        <xsl:variable name="url" select="concat('&lt;a href=" http:="" eutils.ncbi.nlm.nih.gov="" entrez="" eutils="" efetch.fcgi?db="pubmed&amp;retmode=xml&amp;id=',.)" "="" rel="nofollow">http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=',.)"/>
    <xsl:message>downloading <xsl:value-of select="$url"/></xsl:message>
    <xsl:copy-of select="document($url)/PubmedArticleSet[1]/PubmedArticle[1]"/>
    </xsl:for-each>
</MERGED>
</xsl:template>


</xsl:stylesheet>

usage:

 xsltproc --novalid stylesheet.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmode=xml&dbfrom=pubmed&id=19755503&cmd=neighbor"  > pubmed_result.xml
ADD COMMENT
0
Entering edit mode

Hi Pierre...thanks for the help but that link is giving me a 404 error.

ADD REPLY
0
Entering edit mode

http://tinyurl.com/954dgoz works fine here. Check again or verify your proxy

ADD REPLY
0
Entering edit mode

Pierre...you are right...a wireless proxy was giving me problems. The link works well. Do you know how I would go about getting all information for the list of ID that are returned?

ADD REPLY
3
Entering edit mode
11.7 years ago
Recology_ ▴ 100

You can try our R package rentrez here: https://github.com/ropensci/rentrez

To install:

install_github('rentrez', 'ropensci')
library(rentrez)

Then see the function entrez_link

entrez_link(db='pubmed', dbfrom='pubmed', retmode='xml', id=19755503, cmd='neighbor')$file

Get results


http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_101123.dtd">
<eLinkResult>
  <LinkSet>
    <DbFrom>pubmed</DbFrom>
    <IdList>
      <Id>19755503</Id>
    </IdList>
    <LinkSetDb>
      <DbTo>pubmed</DbTo>
      <LinkName>pubmed_pubmed</LinkName>
      <Link>
        <Id>19755503</Id>
      </Link>
      <Link>
        <Id>22075991</Id>
      </Link>

Get the IDs using

sapply(xpathApply(out, "//Link", xmlValue), as.numeric)
ADD COMMENT

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6