I am working on metagenome assemblies. For ORF prediction in the assemblies, I use Prodigal on all contigs, without prior any binning classification using like Eukrep, Tiara, Whokaryote or any others. Because these binning tools do not work very well in the data for my experience. Therefore, I cannot use directly genemark-es, augustus or any other model-based eukaryotic gene prediction tools as I do not have any bins ! On the other hand, I do not want to use metaeuk since It is reference-based prediction tool.
My goal is to obtain coding fragments, exon, for functional profiling, not to reconstruct full eukaryotic gene models. Also, I am fully aware that Prodigal is tailored to prokaryotic gene architecture and is not splice-aware. However, any tools without transcriptomic evidence, they cannot consider spliced mRNAs from genomic data.
So my questions;
Is this use of Prodigal as a heuristic ORF-level detector acceptable when the aim is only to quantify functional, exon part to find domain, rather than full-gene lengths? If I still use Prodigal insistently, which I think doesn't work so badly for functional eukaryotic regions, what limitations/tests or statements would you suggest to make the methodology scientifically sound?
Thank you !