Question: RefSeq ID to TaxID?
0
gravatar for bird77
21 months ago by
bird7740
bird7740 wrote:

I have a number of RefSeq IDs (like this here NZ_LFWC01000004.1) and I would like to get the tax IDs for the species.

Is there a way to automatize that in Bash, Python or R?

genome • 1.5k views
ADD COMMENTlink modified 21 months ago by Bastien Hervé4.5k • written 21 months ago by bird7740
1

If you have a very long list of RefSeq IDs, you might want to do a local search vs "accession2taxid" files, see : Biostars

ADD REPLYlink written 20 months ago by erwan.scaon720

Wonderful, that is exactly what I was looking for. Thank you so much.

ADD REPLYlink written 20 months ago by bird7740

Python and entrez module ?

ADD REPLYlink written 21 months ago by Bastien Hervé4.5k
3
gravatar for Sej Modha
21 months ago by
Sej Modha4.5k
Glasgow, UK
Sej Modha4.5k wrote:

Using NCBI Unix eutilities:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId
ADD COMMENTlink modified 21 months ago • written 21 months ago by Sej Modha4.5k

Any idea about this error?

$  esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -element TaxId

ERROR: No -pattern in command-line arguments

xtract seems to cause the error.

ADD REPLYlink written 21 months ago by bird7740

Try this:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId
ADD REPLYlink written 21 months ago by Sej Modha4.5k

ah, wonderful, thank you very much. :-)

ADD REPLYlink written 21 months ago by bird7740
1
gravatar for Bastien Hervé
21 months ago by
Bastien Hervé4.5k
Limoges, CBRS, France
Bastien Hervé4.5k wrote:

In python :

from Bio import Entrez
from Bio import SeqIO
key_list=['NZ_LFWC01000004.1'] ###Add all your IDs

for key in key_list:
    Entrez.email = "myemailaddress"
    handle = Entrez.efetch(db='nucleotide', id=key, rettype='gb')
    record = SeqIO.read(handle,'genbank')
    if record.features[0].qualifiers['db_xref'][0].split(":")[0] == 'taxon':
        print(record.features[0].qualifiers['db_xref'])[0].split(":")[1]
ADD COMMENTlink modified 21 months ago • written 21 months ago by Bastien Hervé4.5k
1

I guess one has to be careful with the db_xref tag, it can often contain identifiers linking to other databases such as UniProtKB.

ADD REPLYlink modified 21 months ago • written 21 months ago by Sej Modha4.5k
1

Yes right, I added a condition to keep 'taxon' id from NCBI's taxonomic identifier only

ADD REPLYlink written 21 months ago by Bastien Hervé4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1442 users visited in the last hour