Question: Error in fetching the Refseq using Taxonomic ID
0
gravatar for Paul
2.9 years ago by
Paul80
India
Paul80 wrote:

I have been trying to extract the reference sequences for the list of taxonomic IDs I have like:

Taxon ID
1438843
1421962
1324283
1422107

So, for 1438843, the reference sequence is NC_000962.3 and I need to download this particular reference sequence with respect to taxonomic ID 1438843.

When I try to fetch the RefSeq for the first taxonomic ID using the following Eutility command:

esearch -db genome -query "txid1438843 [Organism]" | elink -target nuccore | efilter -query "refseq"| efetch -format fasta

It shows an error like:

ERROR in filt input: callMLink: Query failed on MegaLink server

ERROR in fetch input: callMLink: Query failed on MegaLink server

Could any suggest me a way to fetch the refseq for the above mentioned taxonomic IDs?

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Paul80
3
gravatar for Sej Modha
2.9 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

I was able to download the sequences by removing the space between txid1438843 and [Organism]

esearch -db genome -query "txid1438843[Organism]" | elink -target nuccore | efilter -query "refseq"|efetch -format fasta
ADD COMMENTlink written 2.9 years ago by Sej Modha4.7k

but there is only one reference sequence (NC_000962.3) with respect to taxonomic ID "txid1438843[Organism]"

https://www.ncbi.nlm.nih.gov/genome/?term=txid1438843+%5BOrganism%5D

ADD REPLYlink written 2.9 years ago by Paul80

Thanks it's not showing error anymore. But results in multiple sequences, whereas I need only one RefSeq sequence (NC_000962.3)

ADD REPLYlink written 2.9 years ago by Paul80
2

I am going to hazard a guess that since you are using a taxID (for Mycobacterium tuberculosis) every M. tuberculosis genome that is there in RefSeq database (currently 5248) is going to be pulled up. You probably need an additional filter on your query.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax85k
2

That's right, you will need another filter that fetches the representative assembly. Following command returns fasta sequence for NC_000962.3.

esearch -db genome -query "txid1438843[Organism]"|elink -target assembly|efilter -query "representative[PROP]"|elink -target nuccore -name assembly_nuccore_refseq|efetch -format fasta
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Sej Modha4.7k
1

Nice! Looks like you know your entrez utilities by heart.

Slightly unrelated question. Is there a chart representation of what can/should be logically connected with what for various entrez utilities? I find the in-line help severely lacking except for providing bare syntax.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax85k

I don't think a document like that exists. @Joseph Hughes had asked a similar question: NCBI database schema. I tend to use the NCBI web pages to tackle such complicated queries in GUI first and then try to recreate those links using eutils and bit of help from https://www.ncbi.nlm.nih.gov/books/NBK179288/.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Sej Modha4.7k

Thanks :) it worked like a charm

ADD REPLYlink written 2.9 years ago by Paul80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour