Error in fetching the Refseq using Taxonomic ID
1
0
Entering edit mode
6.7 years ago
Paul ▴ 80

I have been trying to extract the reference sequences for the list of taxonomic IDs I have like:

Taxon ID
1438843
1421962
1324283
1422107

So, for 1438843, the reference sequence is NC_000962.3 and I need to download this particular reference sequence with respect to taxonomic ID 1438843.

When I try to fetch the RefSeq for the first taxonomic ID using the following Eutility command:

esearch -db genome -query "txid1438843 [Organism]" | elink -target nuccore | efilter -query "refseq"| efetch -format fasta

It shows an error like:

ERROR in filt input: callMLink: Query failed on MegaLink server

ERROR in fetch input: callMLink: Query failed on MegaLink server

Could any suggest me a way to fetch the refseq for the above mentioned taxonomic IDs?

eutility NCBI Reference sequence • 3.7k views
ADD COMMENT
0
Entering edit mode

Hello Good evening

How to extract protein Id from the given gene ID

gene ID: AB845604 AB845605 AB845606 AB845607 AB845608 AB845609 AB845610

Tnks in advance

ADD REPLY
0
Entering edit mode

It is not a good practice to ask unrelated questions in pre-existing threads.

You can do the following with EntrezDirect:

$ esearch -db gene -query "AB845604" | elink -target protein | efetch -format acc
NP_001289835.2
BAO18621.1

Use a for loop to go through your list.

ADD REPLY
0
Entering edit mode

opps My mistake not very much familiar with this, but will keep in mind next time Actually i got the same isses like this:

" ERROR in filt input: callMLink: Query failed on MegaLink server "

while i was trying out some commands to convert to the protein ID.

ADD REPLY
3
Entering edit mode
6.7 years ago
Sej Modha 5.3k

I was able to download the sequences by removing the space between txid1438843 and [Organism]

esearch -db genome -query "txid1438843[Organism]" | elink -target nuccore | efilter -query "refseq"|efetch -format fasta
ADD COMMENT
0
Entering edit mode

but there is only one reference sequence (NC_000962.3) with respect to taxonomic ID "txid1438843[Organism]"

https://www.ncbi.nlm.nih.gov/genome/?term=txid1438843+%5BOrganism%5D

ADD REPLY
0
Entering edit mode

Thanks it's not showing error anymore. But results in multiple sequences, whereas I need only one RefSeq sequence (NC_000962.3)

ADD REPLY
2
Entering edit mode

I am going to hazard a guess that since you are using a taxID (for Mycobacterium tuberculosis) every M. tuberculosis genome that is there in RefSeq database (currently 5248) is going to be pulled up. You probably need an additional filter on your query.

ADD REPLY
2
Entering edit mode

That's right, you will need another filter that fetches the representative assembly. Following command returns fasta sequence for NC_000962.3.

esearch -db genome -query "txid1438843[Organism]"|elink -target assembly|efilter -query "representative[PROP]"|elink -target nuccore -name assembly_nuccore_refseq|efetch -format fasta
ADD REPLY
1
Entering edit mode

Nice! Looks like you know your entrez utilities by heart.

Slightly unrelated question. Is there a chart representation of what can/should be logically connected with what for various entrez utilities? I find the in-line help severely lacking except for providing bare syntax.

ADD REPLY
0
Entering edit mode

I don't think a document like that exists. @Joseph Hughes had asked a similar question: NCBI database schema. I tend to use the NCBI web pages to tackle such complicated queries in GUI first and then try to recreate those links using eutils and bit of help from https://www.ncbi.nlm.nih.gov/books/NBK179288/.

ADD REPLY
0
Entering edit mode

Thanks :) it worked like a charm

ADD REPLY

Login before adding your answer.

Traffic: 2987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6