How to extract protein sequences from published paper
0
0
Entering edit mode
2.6 years ago
Nelo ▴ 20

Hi

Is there any way to extract multiple protein sequences given in the published paper using either its PMID, DOI or Supplementary files.

Thanks

DOI supplementary extract protein • 1.1k views
ADD COMMENT
0
Entering edit mode

It's unlikely you will be able to go directly from a paper DOI to a genetic sequence. If the paper lists the databases they uploaded the data to, with accession numbers etc, then it might be possible, but we'd need more information about what the paper says exactly.

ADD REPLY
0
Entering edit mode

Yes some paper mentioned about the accession number but other paper haven't mentioned accession number of protein other than the number of protein they got while doing genome-wide studies of specific plant species. That's why I am looking for some program using the title,PMID or DOI to download.

ADD REPLY
0
Entering edit mode

Caveat: This is likely not going to work for most papers. But if you have the right PMID then you could do the following.

$ esearch -db pubmed -query 22753475 | elink -target nuccore | elink -target protein | efetch -format fasta | grep ">" | head -10
>NP_001292578.1 uncharacterized protein LOC103503105 [Cucumis melo]
>NP_001284396.1 uncharacterized LOC103502119 [Cucumis melo]
>NP_001284656.1 Transcription factor HY5-like [Cucumis melo]
>NP_001284432.1 ABSCISIC ACID-INSENSITIVE 5-like protein 2-like [Cucumis melo]
>NP_001284448.1 Sodium/hydrogen exchanger 2-like [Cucumis melo]
>NP_001284444.1 TMV resistance protein N-like [Cucumis melo]
>NP_001284453.1 ethylene receptor 1 [Cucumis melo]
>NP_001284384.1 alpha-farnesene synthase [Cucumis melo]
>NP_001284474.1 profilin [Cucumis melo]
>NP_001284461.1 translationally-controlled tumor protein homolog [Cucumis melo]
ADD REPLY
0
Entering edit mode

First of of thank you so much for replying again

So the number '22753475' is the PMID I guess but what about the last line 'grep ">" | head -10' for? Are we limiting the number of result we want, because you got exactly the 10 result here

And it's been 10 mins now I executed this command and still its under process

ADD REPLY
0
Entering edit mode

22753475 is the PMID. I added the part starting with grep onwards to demonstrate that this works. You will need to take that part out to save the sequence. Simply redirect to a file esearch .. blah > seq.fa.

ADD REPLY

Login before adding your answer.

Traffic: 2080 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6