Greengene : what is the "99" in OTU description?
4.5 years ago
sacha ★ 2.1k

Hi,

From greengene database, I don't understand the meaning of "99" with OTU description. For exemple, in the stat file, I have the following lines :

Total number of OTUs:
\$ grep -c "^>" *.fasta
61_otus.fasta:22
64_otus.fasta:33
67_otus.fasta:53
70_otus.fasta:125
73_otus.fasta:267
76_otus.fasta:554
79_otus.fasta:1165
82_otus.fasta:2496
85_otus.fasta:5088
88_otus.fasta:10544
91_otus.fasta:22090
94_otus.fasta:46256
97_otus.fasta:99322
99_otus.fasta:203452


Same with the file

  gg_13_5_otus_99_annotated.tree.gz .


So, what does the number 99 mean ? Thanks !

4.5 years ago

My guess would be that means 99% sequence identity. As far as I know, 97% and 99% are two fairly common sequence identity cutoffs for OTU clustering based on 16S rRNA.

4.5 years ago
timodonnell ▴ 80

I also ran into this question, and looks like the answer by Lars above is essentially correct. In particular from this thread on the QIIME forum:

To clarify the 97_otus.fasta rep_set was created by clustering all the sequences in the Greengenes database into 97% identity clusters and then a representative sequence was chosen from each of those clusters to be used to create the 97_tree and 97_taxonomy. Therefore each OTU id in the 97_otus.fasta file I see that representative sequence.