I am trying to taxonomically classify two fungal genomes. I have read a couple ways to do this.
Align the genome to reference genomes using programs such as Mummer or Muscle, then build a maximum likelihood tree with RaxML.
Determine common orthologs with Busco or Cegma from the genomes and reference genomes, align and tree.
In your opinion, which is the best? Is there another way to taxonomically identify an organism from their genome?
Thanks in advance, Morgan
Can you provide more information? From your question I can't really deduct what you want to achieve.
You have no clue which fungal genomes you have? There are apparently reference genomes available?
Even with the brief info at hand I don't think approach 1 will get you very far ... depending on the divergence between the genomes and the reference, nucleotide level will give too little resolution I'm afraid
Based on the 18S rRNA we know it's closely related to Penicillium, however, these genomes are from environmental samples so there is uncertainty about how closely related the entire genome is to the reference. From what I can understand, you compare the genome to several reference genomes you believe are closely related then tree the alignment to see how similar/dissimilar it is to the references. I have seen this done with the entire genome itself or the orthologous genes found within the genomes, my question is asking which way is the most accurate. Does this make more sense?
my first thought would be to go for the orthologous genes then. It's a bit more of a straightforward approach compared to the genome option. On the other hand, if the species are very closely related the proteins might be too conserved (==similar) and might not provide enough resolution. In that case you can use the complete genome indeed.