Question: Fasta Taxonomy Annotation
0
gravatar for zach
5 weeks ago by
zach10
zach10 wrote:

Hi everyone!

I am looking to taxonomically annotate a fasta sequence file and receive a fasta output with annotation. The original pacbio_otu.fasta has the id lines:

> consensus=Uniq2;size=24;seqs=2
GTTACCTTGTTACGACTTCACCCCAATCATCTATCCCACCTTAGGCGGCTGGCTCCAAAAGGTTACCTCACCGACTTCGG

To annotate pacbio_otu.fasta, the taxonomy database rdp_16s_v16_sp.fa has the id lines:

> EF599163_S000871589;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae
GTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGAAACGACACTAACAATCCTTC

If possible, I would like to have taxonomy annotation (from rdp_16s_v16_sp.fa) on my pacbio_otu.fasta file to build my own taxonomy database in fasta format with the id lines like:

> consensus=Uniq2;size=24;seqs=2;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae
GTTACCTTGTTACGACTTCACCCCAATCATCTATCCCACCTTAGGCGGCTGGCTCCAAAAGGTTACCTCACCGACTTCGG

Eventually, with this taxonomy database in fasta format, I would like to run usearch 'sintax' with other fasta data against it.

For my situation, are there any ways or scripts to produce my own taxonomy database in fasta format?

Many thanks, Zach

taxonomy annotation fasta • 116 views
ADD COMMENTlink modified 4 days ago by h.mon30k • written 5 weeks ago by zach10
1

Hi Zach,

A fasta file is a file with one header line, that starts with the sign >, followed by a sequence (DNA, RNA, protein), such as:

>OTU_1

ATCGATGCTAGCTACGATCGATCAGCTAGCTGATCGATCGATGCATCGATC

Therefore the two header file that you're requesting is not in fasta format, because you have: 1st line - header, 2nd line - taxonomy, and 3rd line - sequence. Thus, even if you create that strange format usearch will probably complain and throw you errors saying that your data is not in fasta format.

You have two options here: (1) stick with the file annotated like

>EF599163_S000871589;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae
GTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGAAACGACACTAACAATCCTTC

Or (2) keep a fasta file untouch and the taxonomy in a text file with 2 columns that match headers and taxonomy.

I hope this help,

António

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by antonioggsousa860

Do you know what this EF599163_S000871589 means or come from?

My guess is that you should have a file from usearch matching EF599163_S000871589 with Uniq2, but I'm not sure. I don't use usearch for a long time.

António

ADD REPLYlink written 5 weeks ago by antonioggsousa860
1

Hi Antonio,

Thanks for your response. I have previously done usearch-sintax with other fasta files on rdp_16s_v16_sp.fa as a database, without any problems.

What I want to do is annotate a PacBio fasta file of mine (pacbio_otu.fasta) to get a new taxonomy-annotated fasta file with lines like this:

consensus=Uniq2;size=24;seqs=2;tax=d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:"Vibrionales",f:Vibrionaceae GTTACCTTGTTACGACTTCACCCCAATCATCTATCCCACCTTAGGCGGCTGGCTCCAAAAGGTTACCTCACCGACTTCGG

I do not have an annotated fasta file like the above, and am looking to have that.

'EF599163_S000871589' should represent a particular OTU. The RDP taxonomy database (rdp_16s_v16_sp.fa) was obtained from https://drive5.com/usearch/manual/sintax_downloads.html

Cheers!

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by zach10
0
gravatar for h.mon
4 days ago by
h.mon30k
Brazil
h.mon30k wrote:

For my situation, are there any ways or scripts to produce my own taxonomy database in fasta format?

Although you can use any taxonomic classification pipeline to "annotate" your own fasta file (I would use DADA2+phyloseq for this, but a huge combination of tools could do the work), your annotated fasta will be no better than the original RDP database you used to classify your own sequences, and can even introduce wrong classifications. So my advice is to just use the RDP fasta, or any other curated database you deem appropriate.

ADD COMMENTlink written 4 days ago by h.mon30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour