MEGAN5 fungi classification don't reach lower taxa
7.3 years ago
dapregi

We are doing a fungal metagenomics analysis with MEGAN5 and many of the trees branches end in very generic nodes. For testing the software we have run a test with the next sequence of Candida albicans extracted from GenBank:

> Candida_albicans gene=18S_rRNA_part+ITS1+5,8S_rRNA+ITS2+28S_rRNA_part length=536
tccgtaggtg aacctgcgga aggatcatta ctgatttgct taattgcacc acatgtgttt
ttctttgaaa caaacttgct ttggcggtgg gcccagcctg ccgccagagg tctaaactta
caaccaattt tttatcaact tgtcacacca gattattact aatagtcaaa actttcaaca
acggatctct tggttctcgc atcgatgaag aacgcagcga aatgcgatac gtaatatgaa
ttgcagatat tcgtgaatca tcgaatcttt gaacgcacat tgcgccctct ggtattccgg
agggcatgcc tgtttgagcg tcgtttctcc ctcaaaccgc tgggtttggt gttgagcaat
acgacttggg tttgcttgaa agacggtagt ggtaaggcgg gatcgctttg acaatggctt
aggtctaacc aaaaacattg cttgcggcgg taacgtccac cacgtatatc ttcaaacttt
gacctcaaat caggtaggac tacccgctga acttaagcat atcaataagc ggagga

The next image shows the results of the analysis and its parameters:

Parameters were optimized to reach species level as much as possible. BLAST results classifies the sequence as Candida albicans in most of the hits, except some of them classified as Candida sp.

However, the tree stops at a higher taxon (Saccharomycetes). Is that what I should expect?

Is there any other parameter to maximize the classification of the reads? Should I expect to reach species level with another program using this data?

megan metagenomics fungi
Entering edit mode

I just tried to reproduce this and I get the same result, but one of my blast hits further down actually is to "Saccharomycetes sp.". Are you sure you don't have that hit in your result?

7.3 years ago

This might have to do with the way the LCA (lowest common ancestor) algorithm decides at what level is there sufficient support for a classification to be displayed. For example if the sequence maps exactly to two different leaf nodes it might show you the node above it as it cannot decide where to place it.

MEGAN itself does not understand neither sequences nor taxonomies, it simply reads off labels from the tree and matches that to the labels that the aligner generated.