Question: How to convert protein IDs to nucleotide acc id in eutilities ?
0
gravatar for sankadinesh
6 weeks ago by
sankadinesh20
sankadinesh20 wrote:

I have a excel sheet with protein ID. I want to convert them to nucleotide id. How to do this using eutilies or any other method. Please reply. Thanks Regards, Dinesh

sequencing gene • 162 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by sankadinesh20

Dear sir, Thank you. Can you please suggest me how to convert list of protein IDs (10000 IDs) rather one ? Thanks once again Regards, Dinesh S L

ADD REPLYlink written 6 weeks ago by sankadinesh20

Provide examples of a few.

ADD REPLYlink written 6 weeks ago by genomax89k
1

Dear Sir, I figured out. This one is working

 epost -db protein -input /Users/apple/Desktop/test.rtf -format acc | elink -target nuccore | efetch -format acc

AAA21838
AAA22008
AAA22014
AAA26329
AAA26332
AAA87251
AAA93020
AAA96262
AAB36876
AAB41123

The result was

AH000925.2
L34879.1
L04499.1
U53363.1
L47979.1
AH000924.2
L41344.1
U49859.1
J05111.1

But for one ID (AAA21838), there is no nucleotide ID. I wont be able to know which protein IDs did not yield nucleotide IDs in case of thousands of IDs. Thanks Regards, Dinesh S L

ADD REPLYlink modified 6 weeks ago by genomax89k • written 6 weeks ago by sankadinesh20

But for one ID (AAA21838), there is no nucleotide ID.

That is not correct.

$ esearch -db protein -query "AAA21838" | elink -target nuccore | efetch -format acc
L23514.1

Use a variation like this so you will know which ID's did not return a value. First column is your source ID's.

for i in `cat acc_file`; do printf ${i}"\t"; esearch -db protein -query ${i} | elink -target nuccore | efetch -format acc;  done

AAA21838    L23514.1
AAA22008    J05111.1
AAA22014    L04499.1
AAA26329    AH000924.2
AAA26332    AH000925.2
AAA87251    L34879.1
AAA93020    U49859.1
AAA96262    L41344.1
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by genomax89k

Dear sir, Thanks for spending your valuable time. First I tried like this,

Dinesh$ for i in 'cat /Users/apple/Desktop/testcopy.rtf' ; do printf ${i}"\t" ; esearch -db protein -query ${i} | elink -target nuccore | efetch -format acc;  done

catEntrez Direct does not support positional arguments.
Please remember to quote parameter values containing
whitespace or shell metacharacters.
Db value not found in link input
Db value not found in fetch input

If I give like this,

Dinesh$ for i in cat /Users/apple/Desktop/testcopy.rtf ; do printf ${i}"\t" ; esearch -db protein -query ${i} | elink -target nuccore | efetch -format acc;  done

It is giving a big list of random IDs followed by the following message

CVRY01000005.1
CVRY01000006.1
NT_086364.3
NT_086333.1
/Users/apple/Desktop/testcopy.rtf   Retrying elink, step 2: Empty result - nothing to do
Retrying elink, step 2: Empty result - nothing to do
Retrying elink, step 2: Empty result - nothing to do
ERROR in link output: Empty result - nothing to do
WebEnv: NCID_1_38538961_130.14.22.76_9001_1596752963_1453260018_0MetA0_S_MegaStore
URL: dbfrom=protein&db=nuccore&query_key=1&WebEnv=NCID_1_38538961_130.14.22.76_9001_1596752963_1453260018_0MetA0_S_MegaStore&cmd=neighbor_history&linkname=protein_nuccore
Result: 
https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20101123/elink.dtd">
<eLinkResult>
<LinkSet>
    <ERROR>Empty result - nothing to do</ERROR>
</LinkSet>
</eLinkResult>


ERROR in fetch input: Empty result - nothing to do

Herewith, I am attaching the input file link *https://www.dropbox.com/s/nuguyjiupnabd3j/testcopy.rtf?dl=0*

Thanks once again and Regards, Dinesh SL

ADD REPLYlink modified 6 weeks ago by genomax89k • written 6 weeks ago by sankadinesh20

Make sure there is one ID per line. You need to use (backtick) not plain single quote around the cat file command. There should be no extraneous formatting characters (I see that you are using a RTF format file, use plain text). If you are making the file up on a windows machine and then moving it over to unix then pass it through dos2unix program.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by genomax89k
0
gravatar for genomax
6 weeks ago by
genomax89k
United States
genomax89k wrote:

Since you don't provide any examples I used a random one.

$ esearch -db protein -query "NP_611925.2" | elink -target nuccore | efetch -format acc
NT_033778.4
NM_138081.4

If you have a number of them you could use a loop to go through your list with command above or use epost method.

ADD COMMENTlink written 6 weeks ago by genomax89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1130 users visited in the last hour