Question: Fasta Taxonomy Annotation
5 weeks ago
zach wrote:

Hi everyone!

I am looking to taxonomically annotate a fasta sequence file and receive a fasta output with annotation. The original pacbio_otu.fasta has the id lines:

> consensus=Uniq2;size=24;seqs=2

To annotate pacbio_otu.fasta, the taxonomy database rdp_16s_v16_sp.fa has the id lines:

> EF599163_S000871589;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae

If possible, I would like to have taxonomy annotation (from rdp_16s_v16_sp.fa) on my pacbio_otu.fasta file to build my own taxonomy database in fasta format with the id lines like:

> consensus=Uniq2;size=24;seqs=2;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae

Eventually, with this taxonomy database in fasta format, I would like to run usearch 'sintax' with other fasta data against it.

For my situation, are there any ways or scripts to produce my own taxonomy database in fasta format?

Many thanks, Zach

taxonomy annotation fasta
written 5 weeks ago by zach

Hi Zach,

A fasta file is a file with one header line, that starts with the sign >, followed by a sequence (DNA, RNA, protein), such as:



Therefore the two header file that you're requesting is not in fasta format, because you have: 1st line - header, 2nd line - taxonomy, and 3rd line - sequence. Thus, even if you create that strange format usearch will probably complain and throw you errors saying that your data is not in fasta format.

You have two options here: (1) stick with the file annotated like


Or (2) keep a fasta file untouch and the taxonomy in a text file with 2 columns that match headers and taxonomy.

I hope this help,


written 5 weeks ago by antonioggsousa

Do you know what this EF599163_S000871589 means or come from?

My guess is that you should have a file from usearch matching EF599163_S000871589 with Uniq2, but I'm not sure. I don't use usearch for a long time.


written 5 weeks ago by antonioggsousa

Hi Antonio,

Thanks for your response. I have previously done usearch-sintax with other fasta files on rdp_16s_v16_sp.fa as a database, without any problems.

What I want to do is annotate a PacBio fasta file of mine (pacbio_otu.fasta) to get a new taxonomy-annotated fasta file with lines like this:

consensus=Uniq2;size=24;seqs=2;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae GTTACCTTGTTACGACTTCACCCCAATCATCTATCCCACCTTAGGCGGCTGGCTCCAAAAGGTTACCTCACCGACTTCGG

I do not have an annotated fasta file like the above, and am looking to have that.

'EF599163_S000871589' should represent a particular OTU. The RDP taxonomy database (rdp_16s_v16_sp.fa) was obtained from


written 5 weeks ago by zach
4 days ago
h.mon wrote:

For my situation, are there any ways or scripts to produce my own taxonomy database in fasta format?

Although you can use any taxonomic classification pipeline to "annotate" your own fasta file (I would use DADA2+phyloseq for this, but a huge combination of tools could do the work), your annotated fasta will be no better than the original RDP database you used to classify your own sequences, and can even introduce wrong classifications. So my advice is to just use the RDP fasta, or any other curated database you deem appropriate.

written 4 days ago by h.mon
