We are doing a fungal metagenomics analysis with MEGAN5 and many of the trees branches end in very generic nodes. For testing the software we have run a test with the next sequence of Candida albicans extracted from GenBank:
> Candida_albicans gene=18S_rRNA_part+ITS1+5,8S_rRNA+ITS2+28S_rRNA_part length=536 tccgtaggtg aacctgcgga aggatcatta ctgatttgct taattgcacc acatgtgttt ttctttgaaa caaacttgct ttggcggtgg gcccagcctg ccgccagagg tctaaactta caaccaattt tttatcaact tgtcacacca gattattact aatagtcaaa actttcaaca acggatctct tggttctcgc atcgatgaag aacgcagcga aatgcgatac gtaatatgaa ttgcagatat tcgtgaatca tcgaatcttt gaacgcacat tgcgccctct ggtattccgg agggcatgcc tgtttgagcg tcgtttctcc ctcaaaccgc tgggtttggt gttgagcaat acgacttggg tttgcttgaa agacggtagt ggtaaggcgg gatcgctttg acaatggctt aggtctaacc aaaaacattg cttgcggcgg taacgtccac cacgtatatc ttcaaacttt gacctcaaat caggtaggac tacccgctga acttaagcat atcaataagc ggagga
The next image shows the results of the analysis and its parameters:
Parameters were optimized to reach species level as much as possible. BLAST results classifies the sequence as Candida albicans in most of the hits, except some of them classified as Candida sp.
However, the tree stops at a higher taxon (Saccharomycetes). Is that what I should expect?
Is there any other parameter to maximize the classification of the reads? Should I expect to reach species level with another program using this data?