Elink to get PMID associated with a BioProject or SRA entry
1
0
Entering edit mode
13 months ago

Is there any way through command line EUtils to identify publication(s) associated with a particular BioProject, or from a run entered in SRA database?

For example, something like this (but this doesn't work):

esearch -db bioproject -query "PRJEB31886" | elink -target pubmed


As far as I know, I have to highlight the title of this BioProject and search Google, Pubmed, etc., for a paper with the exact title match. This is cumbersome and hurts my bioinformatically-inclined brain. Looking for a streamlined, command line-friendly way to retrieve a PMID associated with a BioProject, if it exists.

Thanks!

eutils ncbi bioproject pmid pubmed • 597 views
0
Entering edit mode

This specific BioProject does not seem to be linked to any PubMed article. Do you know if the author has published a paper?

0
Entering edit mode

vkkodali : Is it the authors responsibility to link a publication to the data or can NCBI do this automatically by text mining from PubMed, if the article includes the accession number?

1
Entering edit mode

If an identifier from BioProject, BioSample, SRA, GEO, etc are mentioned in the publication, they get picked up automatically and the inter-database connections are made. That said, an author or user can (and highly encouraged to) write to NCBI Helpdesk to notify that a publication is no now out and a connection between that publication and data needs to be made.

0
Entering edit mode

I also learned that NCBI discourages authors from putting their BioProject accession in their manuscript (but they can include the SRA entries). Should I cite BioProject accession numbers in my manuscript?

0
Entering edit mode

NCBI often boggles the mind. What if one produces 500 SRA experiments? Should they now list each number separately in the paper?

I perhaps understand the sentiment, they try to discourage people from linking to the bioproject alone as the main entry point.

0
Entering edit mode

The bioproject ID this thread is talking about has 4 samples and 12 experiments. There can be more than one publication associated with a bioproject ID.

0
Entering edit mode

Can someone else do this on behalf of the authors? There is no link between that BioProject PRJEB31886 and their paper. However, the BioProject ID is mentioned in the paper. EDIT: I just realized that there is no XML output available for that BioProject (it says the ID 31886 is not public; I used Istvan's posted solution which works with his example BioProject accession but not mine). So I guess it's some other issue, unrelated to NCBI linking the BioProject to its associated publication.

1
Entering edit mode

This particular paper was submitted via ENA (European Nucleotide Archive) and perhaps before the paper was accepted. This might be a reason it is not properly crosslinked in PubMed.

Going through ENA does show the paper (you have to click Show under Publication tab)

https://www.ebi.ac.uk/ena/browser/view/PRJEB31886

perhaps there is an automatable query for that

0
Entering edit mode

I believe it is this paper, with the same title as the project. I found it by searching the BioProject title in Google. But this is too cumbersome for hundreds (or even tens...) of these. EDIT: It's definitely this paper. They have mentioned the BioProject ID in the paper itself (and they indicate Genbank accessions; though not SRA accessions).

2
Entering edit mode
13 months ago

the publication is embedded in the information when fetched as XML:

efetch -db bioproject -id PRJNA257197 -format xml > out.xml


extract the pubmed id

cat out.xml | xtract -pattern ProjectDescr -element Publication@id


prints

25214632


which then you can fetch one by one (or as list with epost):

efetch -db pubmed -id 25214632

0
Entering edit mode

Thanks. This is something I can work it into an automated script. It's very helpful to realize that the publication is embedded in the XML output - thank you for your answer!