Question: Elink to get PMID associated with a BioProject or SRA entry
0
gravatar for sovrappensiero
27 days ago by
sovrappensiero10 wrote:

Is there any way through command line EUtils to identify publication(s) associated with a particular BioProject, or from a run entered in SRA database?

For example, something like this (but this doesn't work):

esearch -db bioproject -query "PRJEB31886" | elink -target pubmed

As far as I know, I have to highlight the title of this BioProject and search Google, Pubmed, etc., for a paper with the exact title match. This is cumbersome and hurts my bioinformatically-inclined brain. Looking for a streamlined, command line-friendly way to retrieve a PMID associated with a BioProject, if it exists.

Thanks!

pubmed pmid eutils bioproject ncbi • 122 views
ADD COMMENTlink modified 27 days ago by Istvan Albert ♦♦ 85k • written 27 days ago by sovrappensiero10

This specific BioProject does not seem to be linked to any PubMed article. Do you know if the author has published a paper?

ADD REPLYlink written 26 days ago by vkkodali2.2k

vkkodali : Is it the authors responsibility to link a publication to the data or can NCBI do this automatically by text mining from PubMed, if the article includes the accession number?

ADD REPLYlink modified 26 days ago • written 26 days ago by genomax92k
1

If an identifier from BioProject, BioSample, SRA, GEO, etc are mentioned in the publication, they get picked up automatically and the inter-database connections are made. That said, an author or user can (and highly encouraged to) write to NCBI Helpdesk to notify that a publication is no now out and a connection between that publication and data needs to be made.

ADD REPLYlink written 26 days ago by vkkodali2.2k

I also learned that NCBI discourages authors from putting their BioProject accession in their manuscript (but they can include the SRA entries). Should I cite BioProject accession numbers in my manuscript?

ADD REPLYlink written 26 days ago by sovrappensiero10

NCBI often boggles the mind. What if one produces 500 SRA experiments? Should they now list each number separately in the paper?

I perhaps understand the sentiment, they try to discourage people from linking to the bioproject alone as the main entry point.

ADD REPLYlink written 26 days ago by Istvan Albert ♦♦ 85k

The bioproject ID this thread is talking about has 4 samples and 12 experiments. There can be more than one publication associated with a bioproject ID.

ADD REPLYlink written 26 days ago by genomax92k

Can someone else do this on behalf of the authors? There is no link between that BioProject PRJEB31886 and their paper. However, the BioProject ID is mentioned in the paper. EDIT: I just realized that there is no XML output available for that BioProject (it says the ID 31886 is not public; I used Istvan's posted solution which works with his example BioProject accession but not mine). So I guess it's some other issue, unrelated to NCBI linking the BioProject to its associated publication.

ADD REPLYlink modified 26 days ago • written 26 days ago by sovrappensiero10
1

This particular paper was submitted via ENA (European Nucleotide Archive) and perhaps before the paper was accepted. This might be a reason it is not properly crosslinked in PubMed.

Going through ENA does show the paper (you have to click Show under Publication tab)

https://www.ebi.ac.uk/ena/browser/view/PRJEB31886

perhaps there is an automatable query for that

ADD REPLYlink written 26 days ago by Istvan Albert ♦♦ 85k

I believe it is this paper, with the same title as the project. I found it by searching the BioProject title in Google. But this is too cumbersome for hundreds (or even tens...) of these. EDIT: It's definitely this paper. They have mentioned the BioProject ID in the paper itself (and they indicate Genbank accessions; though not SRA accessions).

ADD REPLYlink modified 26 days ago • written 26 days ago by sovrappensiero10
2
gravatar for Istvan Albert
27 days ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

the publication is embedded in the information when fetched as XML:

efetch -db bioproject -id PRJNA257197 -format xml > out.xml

extract the pubmed id

cat out.xml | xtract -pattern ProjectDescr -element Publication@id

prints

25214632

which then you can fetch one by one (or as list with epost):

efetch -db pubmed -id 25214632
ADD COMMENTlink modified 27 days ago • written 27 days ago by Istvan Albert ♦♦ 85k

Thanks. This is something I can work it into an automated script. It's very helpful to realize that the publication is embedded in the XML output - thank you for your answer!

ADD REPLYlink written 26 days ago by sovrappensiero10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1310 users visited in the last hour