Question: How can one retrieve the list of PMIDs of abstracts associated with a MeSH term?
0
gravatar for Arjun Krishnan
3.8 years ago by
United States
Arjun Krishnan40 wrote:

I'm interested in retrieving the list of PubMed article identifiers (PMIDs) of all the articles in PubMed that is associated with a MeSH term like "Breast Neoplasms". I wish to repeat this for a number of MeSH terms, which therefore makes direct web-queries at http://www.ncbi.nlm.nih.gov/pubmed/ too painful.

Is there a way NCBI's E-utilities to do this efficiently from the linux command-line?

pubmed eutils mesh • 2.0k views
ADD COMMENTlink modified 3.8 years ago by Chris S.290 • written 3.8 years ago by Arjun Krishnan40
2
gravatar for Pierre Lindenbaum
3.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

used ncbi esearch with the MESH field modifier: (as defined in http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed )

 

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%22Breast+Neoplasms%22%5BMESH%5D

 

$ curl -s 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%22Breast+Neoplasms%22%5BMESH%5D' | xmllint --format -
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD esearch 20060628//EN" "http://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult>
  <Count>221299</Count>
  <RetMax>20</RetMax>
  <RetStart>0</RetStart>
  <IdList>
    <Id>25668825</Id>
    <Id>25668824</Id>
    <Id>25668823</Id>
    <Id>25668822</Id>
    <Id>25647216</Id>
    <Id>25647215</Id>
    <Id>25647190</Id>
    <Id>25603628</Id>
    <Id>25597209</Id>
    <Id>25596051</Id>
    <Id>25596048</Id>
    <Id>25585789</Id>
    <Id>25585788</Id>
    <Id>25585780</Id>
    <Id>25585779</Id>
    <Id>25585778</Id>
    <Id>25585328</Id>
    <Id>25585323</Id>
    <Id>25577824</Id>
    <Id>25568923</Id>
  </IdList>
  <TranslationSet/>
  <TranslationStack>
    <TermSet>
      <Term>"Breast Neoplasms"[MESH]</Term>
      <Field>MESH</Field>
      <Count>221299</Count>
      <Explode>Y</Explode>
    </TermSet>
    <OP>GROUP</OP>
  </TranslationStack>
  <QueryTranslation>"Breast Neoplasms"[MESH]</QueryTranslation>
</eSearchResult>
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Pierre Lindenbaum115k
1
gravatar for Chris S.
3.8 years ago by
Chris S.290
United States
Chris S.290 wrote:

Entrez Direct is another option

esearch -db pubmed -query "Burkholderia pseudomallei/metabolism[MESH]" | efetch -format uid
24866793
24626296
24595140
24502667
24462575
...

 

 

 

ADD COMMENTlink written 3.8 years ago by Chris S.290
0
gravatar for RamRS
3.8 years ago by
RamRS19k
Houston, TX
RamRS19k wrote:
curl -vs 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=breast+neoplasms&retmode=text&retmax=1000' 2>&1 | grep "^<Id>"

should give you just the IDs with HTML tags. Use sed to remove the tags and transform in any way you'd like to see it.

Please note: I've set retmax to the 1000. You can change it to get IDs in batches of your preferred size. Update from Pierre is that this number cannot be >100,000. 

Also, curl -vs writes to STDOUT without progress bar, 2>&1 combines STDERR and STDOUT so you can process the output directly through pipes.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by RamRS19k

FYI max(retmax)= 100,000

ADD REPLYlink written 3.8 years ago by Pierre Lindenbaum115k

Ah, I see. Batches it is, then!

ADD REPLYlink written 3.8 years ago by RamRS19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1346 users visited in the last hour