I am using the trnL chloroplast gene to identify plants from herbivore dung, and am currently trying to assign taxonomy to trnL sequences from my Illumina output. Here is the QIIME script and options I would like to run:
assign_taxonomy.py -i rep_set_numbered.fa -r sequence.fasta -t id_to_taxonomy.txt -e 0.01 -m blast
I have the input file from our data pipeline, and the reference file from NCBI GenBank (205,703 sequences). However, I do not have a tab-delimted taxonomy text file. Normally I would generate one from Excel, but because the FASTA file is so large (over 500 MB), it cannot be fully viewed in Excel, and therefore cannot be reliably edited.
My question is, is there a command line method for generating my own tab-delimited taxonomy file from my reference FASTA file, and if so, how would I do that? If not, what are my other options for handling this required option on the "assign_taxonomy.py" QIIME script?
Do the headers in your fasta file include taxonomy information? Do they have to be in some specific format in the taxonomy file? Please post examples..