TaxID mapping file
6 weeks ago
Hi guys,

does anyone know how I get TaxID mapping file for NR or Uniprot database?

Background: I use Diamond for my de novo transcriptome annotation. My next goal is to use hits tsv file in blobtools for contamination detection. To do that I need my query transcript IDs with the corresponding subject TaxID in hits.tsv file. Diamond doesn't give that information but I can use blobtools taxify option to match corresponding TaxidIDs to my subject hits. I read blobtools documentation and to do that I need TaxID mapping file for the database that I used for annotation and that file consists of information such as.

in this example

I am not sure how to get that file so please help. :)

nodesDB file should have been installed if you had used "Install" script for blobtools according to :

You can find the NCBI taxonomy database files here: Take a look at for the contents.

thank you, I'll look at these files/documents. .

if I understood correctly, I might need fle prot.accession2taxid.gz file? According to the documentation in column 2 is Accession.version and in column 3 is TaxID. I should download that file from NCBI, unpack it and than do:

blobtools taxify \ 
 -f diamond.out \
 -m prot.accession2taxid.taxids 
 -s 2 \ # column of sequenceID of subject in taxID mapping file
 -t 3 # column of TaxID of sequenceID in taxID mapping file

Does that make sense?

Did anyone try this?

I also saw this post about getting taxonomy info in Diamond output. Still, it seems it has to be incorporated in makedb step + I might be getting more than 1 taxid hit according to Diamond documentation which I am not sure might work with blobtools.


