Question

RefSeq ID to TaxID?

0

Entering edit mode

6.1 years ago

bird77 ▴ 80

I have a number of RefSeq IDs (like this here NZ_LFWC01000004.1) and I would like to get the tax IDs for the species.

Is there a way to automatize that in Bash, Python or R?

genome • 5.5k views

ADD COMMENT • link updated 6.1 years ago by Bastien Hervé 5.3k • written 6.1 years ago by bird77 ▴ 80

1

Entering edit mode

If you have a very long list of RefSeq IDs, you might want to do a local search vs "accession2taxid" files, see : Biostars

ADD REPLY • link 6.1 years ago by erwan.scaon ▴ 940

0

Entering edit mode

Wonderful, that is exactly what I was looking for. Thank you so much.

ADD REPLY • link 6.1 years ago by bird77 ▴ 80

0

Entering edit mode

Python and entrez module ?

ADD REPLY • link 6.1 years ago by Bastien Hervé 5.3k

2

Entering edit mode

6.1 years ago

Bastien Hervé 5.3k

In python :

from Bio import Entrez
from Bio import SeqIO
key_list=['NZ_LFWC01000004.1'] ###Add all your IDs

for key in key_list:
    Entrez.email = "myemailaddress"
    handle = Entrez.efetch(db='nucleotide', id=key, rettype='gb')
    record = SeqIO.read(handle,'genbank')
    if record.features[0].qualifiers['db_xref'][0].split(":")[0] == 'taxon':
        print(record.features[0].qualifiers['db_xref'])[0].split(":")[1]

ADD COMMENT • link 6.1 years ago by Bastien Hervé 5.3k

1

Entering edit mode

I guess one has to be careful with the db_xref tag, it can often contain identifiers linking to other databases such as UniProtKB.

ADD REPLY • link 6.1 years ago by Sej Modha 5.3k

1

Entering edit mode

Yes right, I added a condition to keep 'taxon' id from NCBI's taxonomic identifier only

ADD REPLY • link 6.1 years ago by Bastien Hervé 5.3k

score 4 · Accepted Answer · 2018-03-16

4

Entering edit mode

6.1 years ago

Sej Modha 5.3k

Using NCBI Unix eutilities:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId

ADD COMMENT • link 6.1 years ago by Sej Modha 5.3k

0

Entering edit mode

Any idea about this error?

$  esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -element TaxId

ERROR: No -pattern in command-line arguments

xtract seems to cause the error.

ADD REPLY • link 6.1 years ago by bird77 ▴ 80

0

Entering edit mode

Try this:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId