Question: RefSeq ID to TaxID?
0
gravatar for bird77
2.2 years ago by
bird7750
bird7750 wrote:

I have a number of RefSeq IDs (like this here NZ_LFWC01000004.1) and I would like to get the tax IDs for the species.

Is there a way to automatize that in Bash, Python or R?

genome • 2.0k views
ADD COMMENTlink modified 2.2 years ago by Bastien Hervé4.6k • written 2.2 years ago by bird7750
1

If you have a very long list of RefSeq IDs, you might want to do a local search vs "accession2taxid" files, see : Biostars

ADD REPLYlink written 2.2 years ago by erwan.scaon790

Wonderful, that is exactly what I was looking for. Thank you so much.

ADD REPLYlink written 2.2 years ago by bird7750

Python and entrez module ?

ADD REPLYlink written 2.2 years ago by Bastien Hervé4.6k
4
gravatar for Sej Modha
2.2 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

Using NCBI Unix eutilities:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Sej Modha4.7k

Any idea about this error?

$  esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -element TaxId

ERROR: No -pattern in command-line arguments

xtract seems to cause the error.

ADD REPLYlink written 2.2 years ago by bird7750

Try this:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId
ADD REPLYlink written 2.2 years ago by Sej Modha4.7k

ah, wonderful, thank you very much. :-)

ADD REPLYlink written 2.2 years ago by bird7750
1
gravatar for Bastien Hervé
2.2 years ago by
Bastien Hervé4.6k
Limoges, CBRS, France
Bastien Hervé4.6k wrote:

In python :

from Bio import Entrez
from Bio import SeqIO
key_list=['NZ_LFWC01000004.1'] ###Add all your IDs

for key in key_list:
    Entrez.email = "myemailaddress"
    handle = Entrez.efetch(db='nucleotide', id=key, rettype='gb')
    record = SeqIO.read(handle,'genbank')
    if record.features[0].qualifiers['db_xref'][0].split(":")[0] == 'taxon':
        print(record.features[0].qualifiers['db_xref'])[0].split(":")[1]
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Bastien Hervé4.6k
1

I guess one has to be careful with the db_xref tag, it can often contain identifiers linking to other databases such as UniProtKB.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Sej Modha4.7k
1

Yes right, I added a condition to keep 'taxon' id from NCBI's taxonomic identifier only

ADD REPLYlink written 2.2 years ago by Bastien Hervé4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1758 users visited in the last hour