How can one retrieve the list of PMIDs of abstracts associated with a MeSH term?
3
0
Entering edit mode
9.2 years ago

I'm interested in retrieving the list of PubMed article identifiers (PMIDs) of all the articles in PubMed that is associated with a MeSH term like "Breast Neoplasms". I wish to repeat this for a number of MeSH terms, which therefore makes direct web-queries at http://www.ncbi.nlm.nih.gov/pubmed/ too painful.

Is there a way NCBI's E-utilities to do this efficiently from the linux command-line?

pubmed mesh eutils • 4.3k views
ADD COMMENT
2
Entering edit mode
9.2 years ago

Used ncbi esearch with the MESH field modifier (as defined in http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed)

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%22Breast+Neoplasms%22%5BMESH%5D

$ curl -s 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%22Breast+Neoplasms%22%5BMESH%5D' | xmllint --format -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD esearch 20060628//EN" "http://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult>
  <Count>221299</Count>
  <RetMax>20</RetMax>
  <RetStart>0</RetStart>
  <IdList>
    <Id>25668825</Id>
    <Id>25668824</Id>
    <Id>25668823</Id>
    <Id>25668822</Id>
    <Id>25647216</Id>
    <Id>25647215</Id>
    <Id>25647190</Id>
    <Id>25603628</Id>
    <Id>25597209</Id>
    <Id>25596051</Id>
    <Id>25596048</Id>
    <Id>25585789</Id>
    <Id>25585788</Id>
    <Id>25585780</Id>
    <Id>25585779</Id>
    <Id>25585778</Id>
    <Id>25585328</Id>
    <Id>25585323</Id>
    <Id>25577824</Id>
    <Id>25568923</Id>
  </IdList>
  <TranslationSet/>
  <TranslationStack>
    <TermSet>
      <Term>"Breast Neoplasms"[MESH]</Term>
      <Field>MESH</Field>
      <Count>221299</Count>
      <Explode>Y</Explode>
    </TermSet>
    <OP>GROUP</OP>
  </TranslationStack>
  <QueryTranslation>"Breast Neoplasms"[MESH]</QueryTranslation>
</eSearchResult>
ADD COMMENT
1
Entering edit mode
9.2 years ago
Chris S. ▴ 320

Entrez Direct is another option

esearch -db pubmed -query "Burkholderia pseudomallei/metabolism[MESH]" | efetch -format uid
24866793
24626296
24595140
24502667
24462575
...
ADD COMMENT
0
Entering edit mode
9.2 years ago
Ram 43k
curl -vs 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=breast+neoplasms&retmode=text&retmax=1000' 2>&1 | grep "^<Id>"

should give you just the IDs with HTML tags. Use sed to remove the tags and transform in any way you'd like to see it.

Please note: I've set retmax to the 1000. You can change it to get IDs in batches of your preferred size. Update from Pierre is that this number cannot be >100,000.

Also, curl -vs writes to STDOUT without progress bar, 2>&1 combines STDERR and STDOUT so you can process the output directly through pipes.

ADD COMMENT
0
Entering edit mode

FYI max(retmax)= 100,000

ADD REPLY
0
Entering edit mode

Ah, I see. Batches it is, then!

ADD REPLY

Login before adding your answer.

Traffic: 2952 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6