Question

Elink to get PMID associated with a BioProject or SRA entry

0

Entering edit mode

3.5 years ago

sovrappensiero ▴ 90

Is there any way through command line EUtils to identify publication(s) associated with a particular BioProject, or from a run entered in SRA database?

For example, something like this (but this doesn't work):

esearch -db bioproject -query "PRJEB31886" | elink -target pubmed

As far as I know, I have to highlight the title of this BioProject and search Google, Pubmed, etc., for a paper with the exact title match. This is cumbersome and hurts my bioinformatically-inclined brain. Looking for a streamlined, command line-friendly way to retrieve a PMID associated with a BioProject, if it exists.

Thanks!

eutils ncbi bioproject pmid pubmed • 2.2k views

ADD COMMENT • link updated 3.5 years ago by Istvan Albert 100k • written 3.5 years ago by sovrappensiero ▴ 90

0

Entering edit mode

This specific BioProject does not seem to be linked to any PubMed article. Do you know if the author has published a paper?

ADD REPLY • link 3.5 years ago by vkkodali_ncbi ★ 3.7k

0

Entering edit mode

vkkodali : Is it the authors responsibility to link a publication to the data or can NCBI do this automatically by text mining from PubMed, if the article includes the accession number?

ADD REPLY • link 3.5 years ago by GenoMax 141k

1

Entering edit mode

If an identifier from BioProject, BioSample, SRA, GEO, etc are mentioned in the publication, they get picked up automatically and the inter-database connections are made. That said, an author or user can (and highly encouraged to) write to NCBI Helpdesk to notify that a publication is no now out and a connection between that publication and data needs to be made.

ADD REPLY • link 3.5 years ago by vkkodali_ncbi ★ 3.7k

0

Entering edit mode

I also learned that NCBI discourages authors from putting their BioProject accession in their manuscript (but they can include the SRA entries). Should I cite BioProject accession numbers in my manuscript?

ADD REPLY • link 3.5 years ago by sovrappensiero ▴ 90

0

Entering edit mode

NCBI often boggles the mind. What if one produces 500 SRA experiments? Should they now list each number separately in the paper?

I perhaps understand the sentiment, they try to discourage people from linking to the bioproject alone as the main entry point.

ADD REPLY • link 3.5 years ago by Istvan Albert 100k

0

Entering edit mode

The bioproject ID this thread is talking about has 4 samples and 12 experiments. There can be more than one publication associated with a bioproject ID.

ADD REPLY • link 3.5 years ago by GenoMax 141k

0

Entering edit mode

Can someone else do this on behalf of the authors? There is no link between that BioProject PRJEB31886 and their paper. However, the BioProject ID is mentioned in the paper. EDIT: I just realized that there is no XML output available for that BioProject (it says the ID 31886 is not public; I used Istvan's posted solution which works with his example BioProject accession but not mine). So I guess it's some other issue, unrelated to NCBI linking the BioProject to its associated publication.

ADD REPLY • link 3.5 years ago by sovrappensiero ▴ 90

1

Entering edit mode

This particular paper was submitted via ENA (European Nucleotide Archive) and perhaps before the paper was accepted. This might be a reason it is not properly crosslinked in PubMed.

Going through ENA does show the paper (you have to click Show under Publication tab)

https://www.ebi.ac.uk/ena/browser/view/PRJEB31886

perhaps there is an automatable query for that

ADD REPLY • link 3.5 years ago by Istvan Albert 100k

0

Entering edit mode

I believe it is this paper, with the same title as the project. I found it by searching the BioProject title in Google. But this is too cumbersome for hundreds (or even tens...) of these. EDIT: It's definitely this paper. They have mentioned the BioProject ID in the paper itself (and they indicate Genbank accessions; though not SRA accessions).

ADD REPLY • link 3.5 years ago by sovrappensiero ▴ 90

score 3 · Accepted Answer · 2020-10-27

3

Entering edit mode

3.5 years ago

Istvan Albert 100k

the publication is embedded in the information when fetched as XML:

efetch -db bioproject -id PRJNA257197 -format xml > out.xml

extract the pubmed id

cat out.xml | xtract -pattern ProjectDescr -element Publication@id

prints

25214632

which then you can fetch one by one (or as list with epost):

efetch -db pubmed -id 25214632

ADD COMMENT • link 3.5 years ago by Istvan Albert 100k

0

Entering edit mode

Thanks. This is something I can work it into an automated script. It's very helpful to realize that the publication is embedded in the XML output - thank you for your answer!

ADD REPLY • link 3.5 years ago by sovrappensiero ▴ 90