Sort bacteria names into taxonomic tree
1
0
Entering edit mode
8.8 years ago

I am trying to find an online tool that would re-arrange all the bacteria species i.e. Escherichia coli, Bacillus subtilis into a large tree that separates them by phyla / order or family. The data I have has over 2500 different bacteria species and is there such a program that arranges these into a taxonomic tree by name within the fasta heading alone?

Andrew

alignment • 2.1k views
ADD COMMENT
0
Entering edit mode
8.8 years ago

This is really odd; it's such a specific request, and I wrote a program that specifically does that. I think.

What the program does is read a fasta file in which all of the sequences have a gi number in their header, like >gi12456|ecoli|blah blah. Then, using the gi to taxid translation table (gi_taxid_nucl.dmp from ftp://ftp.ncbi.nih.gov/pub/taxonomy/ ), it sorts all the sequences into taxonomic order. For example, all the archea would be together and all the bacteria would be together; within the bacteria, all the alphaproteobacteria would be together, and so forth down to species level. It does not output them to different files, though, just one file. Is that what you are looking for?

The usage is like this:

gitable.sh gi_taxid_nucl.dmp.gz gitable.int1d.gz
java -ea -Xmx2g -cp /path/to/bbmap/current/ tax.TaxTree names.dmp nodes.dmp tree.taxtree
sortbytaxa.sh in=file.fasta out=sorted.fasta gi=gitable.int1d.gz tree=tree.taxtree

I forgot to write a shell script for TaxTree but I'll do that soon. names.dmp and nodes.dmp both come from the same NCBI website inside the zip file taxdmp.zip. That website seems to be down right now though.

These are all bundled with BBMap.

ADD COMMENT

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6