Question

Extract PMIDs from a gene or protein ID

0

Entering edit mode

9.0 years ago

shelly.deforte ▴ 190

Given a uniprot ID, I am trying to automatically extract related pubmed IDs (PMIDs) from pubmed. I can map the UniProt ID to something NCBI can understand. For instance, UniProt ID O14733 can be mapped to GI:6831583 and then you can launch a search from http://www.ncbi.nlm.nih.gov/protein/O14733 to see the associated pubmed articles with the URL http://www.ncbi.nlm.nih.gov/pubmed?linkname=protein_pubmed_weighted&from_uid=6831583.

I have never used ncbi's e-utils, so it may be a very simple modification to be able to fetch these articles automatically, but I can't figure it out. My best guess was http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&LinkName=protein_pubmed_weighted&from_uid=6831583, but this returns nothing.

Basically, given an ID such as 683583, I want to return a list of PMIDs. I would rather do this in python if possible. Any suggestions?

pubmed • 3.8k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by shelly.deforte ▴ 190

1

Entering edit mode

Similar posts:

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by Ashutosh Pandey 12k

3

Entering edit mode

9.0 years ago

shelly.deforte ▴ 190

For completeness, here's what I worked out with David W's help. The construction of the URL is such:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=protein&db=pubmed&id=215274019&linkname=protein_pubmed_weighted

This is how I was able to retrieve the records in Biopython:

protein_ID = "215274019"

handle = Entrez.elink(db="pubmed", dbfrom="protein", id=protein_id, linkname="protein_pubmed_weighted")
record = Entrez.read(handle)

for PMID in record[0]['LinkSetDb'][0]['Link']:
    print PMID['Id']

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by shelly.deforte ▴ 190

0

Entering edit mode

Hey, I am trying to do the same thing basically. I have a q. D o you know the differences between the different linknames, i.e., protein_pubmed_weighted, protein_pubmed, and protein_pubmed_refseq?

Best,
Nils

ADD REPLY • link updated 14 months ago by Ram 43k • written 7.7 years ago by nils.rudqvist ▴ 20

1

Entering edit mode

9.0 years ago

asking.for.help ▴ 20

Depending on the scope of your project, you might want to directly download NCBI's look-up table between EntrezIDs and PubmedIDs and integrate this table into your workflow.

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz

I like this table because it enables easy (and computationally fast) filtering; e.g.: exclude papers, which cover 100s -1000s of different genes (and usually thus do not reveal gene-specific biology). e.g.: find genes, which are only mentioned together with your genes of interest

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by asking.for.help ▴ 20

0

Entering edit mode

This looks like a great resource, though I don't think I can easily map my UniProt IDs to genes, and I think it might change the coverage of the papers if I did. Still, I'm definitely going to bookmark this folder, thanks!

ADD REPLY • link 9.0 years ago by shelly.deforte ▴ 190

Ram · Accepted Answer · 2015-05-11

2

Entering edit mode

9.0 years ago

David W 4.9k

If you use the link EUtil with dbfrom="protein", db="pubmed" you'll get a list of pmids associated with that protein.

You can then use esummary or efetch on those pmids.

ADD COMMENT • link updated 14 months ago by Ram 43k • written 9.0 years ago by David W 4.9k

0

Entering edit mode

Thanks, that's what I needed!

ADD REPLY • link updated 14 months ago by Ram 43k • written 9.0 years ago by shelly.deforte ▴ 190