Question

Redirection of Duplicate PMIDs

0

Entering edit mode

3 months ago

dominickd • 0

Is there a way to fetch publications with redirected PMIDs? If so, how?

For instance, the PMID 30134295 redirects to 30379686: https://pubmed.ncbi.nlm.nih.gov/30134295/

When I attempt to fetch the publication using the PMID 30134295, I get an error. I can manually check PubMed to see which PMID it redirects to, but I was wondering if there is a way to do it with Biopython.

Thanks for any help you can provide!

pubmed pmid • 1.1k views

ADD COMMENT • link updated 3 months ago by LauferVA 4.3k • written 3 months ago by dominickd • 0

0

Entering edit mode

how is this query itself originating?

ADD REPLY • link 3 months ago by LauferVA 4.3k

0

Entering edit mode

I have a list of PMIDs for publications linked to a list of grants, exported from NIH RePORTER.

ADD REPLY • link 3 months ago by dominickd • 0

0

Entering edit mode

yep. in this case id definitely start with the grant numbers themselves as others have indicated. i did not recommend this before due to uncertainty regarding the type of award being described as well as the phrasing of the original post.

but thats the better approach in this case

ADD REPLY • link 3 months ago by LauferVA 4.3k

score 1 · Answer 1 · 2024-04-22

1

Entering edit mode

3 months ago

GenoMax 144k

More than likely not since the database query seems to work for the redirected PMID but not the original.

$ esearch -db pubmed -query "30379686[PMID]" 
<ENTREZ_DIRECT>
  <Db>pubmed</Db>
  <QueryKey>1</QueryKey>
  <Count>1</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

Where as this one does not work

$ esearch -db pubmed -query "30134295[PMID]" 
<ENTREZ_DIRECT>
  <Db>pubmed</Db>
  <QueryKey>1</QueryKey>
  <Count>0</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

ADD COMMENT • link 3 months ago by GenoMax 144k

0

Entering edit mode

I was able to come up with a workaround using the requests library:

import requests
pmid = 30134295
url = 'https://pubmed.ncbi.nlm.nih.gov/' + str(pmid) + '/'
r = requests.head(url)
location = r.headers['location'] 
start = location.index('/')+1
end = location.index('/',start)
redirect = location[start:end]
print(redirect)

I would still like to know if there is a better way to go about this with the Biopython library or the PubMed API.

ADD REPLY • link 3 months ago by dominickd • 0

0

Entering edit mode

let me know the scope and scale of the request (e.g. I need to do this for every record in Pubmed) and can help

ADD REPLY • link 3 months ago by LauferVA 4.3k

0

Entering edit mode

The scope is all publications related to grants from an NIH award program. There are currently ~120k publications in total (about a few hundred with PMIDs that are redirecting), but we hope to make a database that updates regularly.

ADD REPLY • link 3 months ago by dominickd • 0

1

Entering edit mode

Ok! I have a recommendation. I'll submit it as an answer, pending your feedback.

vincent

ADD REPLY • link 3 months ago by LauferVA 4.3k

score 1 · Answer 2 · 2024-04-23

linked to a list of grants

Using EntrezDirect this may be much simpler if you have the grant number available. I tested this with a couple of random grants

$ esearch -db pubmed -query "U01EB025162" | efetch -format abstract

or you could do something like

$ esearch -db pubmed -query "U01EB025162" | esummary | xtract -pattern DocumentSummary -element FullJournalName,ELocationID
Magnetic resonance in medicine  doi: 10.1002/mrm.29976
Magnetic resonance in medicine  doi: 10.1002/mrm.29990
Magnetic resonance in medicine  doi: 10.1002/mrm.29865
Magnetic resonance in medicine  doi: 10.1002/mrm.29668
Bioengineering (Basel, Switzerland)     doi: 10.3390/bioengineering9120736
Magnetic resonance in medicine  doi: 10.1002/mrm.29546
Magnetic resonance in medicine  doi: 10.1002/mrm.29293
Magnetic resonance in medicine  doi: 10.1002/mrm.28934
Magnetic resonance in medicine  doi: 10.1002/mrm.28882
Magnetic resonance in medicine  doi: 10.1002/mrm.27413
Magnetic resonance in medicine  doi: 10.1002/mrm.27382

$ esearch -db pubmed -query "R01EB009055" | esummary | xtract -pattern DocumentSummary -element FullJournalName,ELocationID
Journal of magnetic resonance imaging : JMRI    doi: 10.1002/jmri.24151

Ram · Answer 3 · 2024-04-23

Hi Dominick,

One thing I am not sure of is, how is it that you got the other PMID in the first place?

The reason I ask is that you mentioned the Bio package .. Bio is a very lovely wrapper written (by the founder of Biostars) around the eutils published by the NCBI. Why say this? Well, because using either eutils or Bio to generate the PMIDs, rather than to pull them, could circumvent the problem you describe for exactly the reason GenoMax has indicated above.

But wouldn't this lead you with a second problem - that you would now have PMIDs unlinked to the information you want (i.e. being affiliated with the NIH grant awards program)? Not necessarily ...

Pending further knowledge about this award program (which you have but I don't), I might do something like this:

Fetch a superset that includes all PMIDs in the last 20 years using esearch (this will give you just the IDs only).
Using the IDs from 1., now generate a detailed summary for all records obtained detailed data on the union using esummary.
Pay careful attention to all the available fields during this esummary, in particular to fields associated with grant numbers (which seems to be what you're after).
Link PMIDs to the grant award of interest directly, without dealing with deprecated PMIDs at all.

For 2. and 3. (the fields of interest part specifically), note the commented line of Python code:

# Perform the PubMed search
handle = Entrez.esearch(db="pubmed", term=search_query, retmax=1000)
record = Entrez.read(handle)
handle.close()

# Fetch details for each article
articles = []
for pmid in record["IdList"]:
    article_handle = Entrez.esummary(db="pubmed", id=pmid)
    article_record = Entrez.read(article_handle)[0]
    article = {
        "Title": article_record["Title"],
        "Authors": ", ".join(article_record["AuthorList"]),
        "Journal": article_record["FullJournalName"],
        "DOI": article_record.get("DOI", ""),
        "PMID": pmid,
        "Grant Numbers": ", ".join(article_record.get("ArticleIds", {}).get("GrantList", [])) ###### It is possible this field or another field will contain this information directly depending on your award info.
    }
    articles.append(article)

A complete workflow that will make very quick work of this query is implemented here: https://github.com/LauferVA/EntrezMetadataTools/blob/main/libs/query_api.py. This workflow will download, annotate and reformat the metadata associated with all 10M seq records in SRA in just over 2 hours on an ordinary laptop, so 120k should not be an issue even without an API key. In the link provided, you could modify line 148 to point at pubmed rather than sra, then modify line 149 to reflect your query terms.

Other solutions could be imagined, too (e.g. HTML parser for the exact string a/w redirection). Happy querying.

VL