Prodigal as ORF finder for eukaryotic genes in metagenome
2
1
Entering edit mode
1 day ago

I am working on metagenome assemblies. For ORF prediction in the assemblies, I use Prodigal on all contigs, without prior any binning classification using like Eukrep, Tiara, Whokaryote or any others. Because these binning tools do not work very well in the data for my experience. Therefore, I cannot use directly genemark-es, augustus or any other model-based eukaryotic gene prediction tools as I do not have any bins ! On the other hand, I do not want to use metaeuk since It is reference-based prediction tool.

My goal is to obtain coding fragments, exon, for functional profiling, not to reconstruct full eukaryotic gene models. Also, I am fully aware that Prodigal is tailored to prokaryotic gene architecture and is not splice-aware. However, any tools without transcriptomic evidence, they cannot consider spliced mRNAs from genomic data.

So my questions;

Is this use of Prodigal as a heuristic ORF-level detector acceptable when the aim is only to quantify functional, exon part to find domain, rather than full-gene lengths? If I still use Prodigal insistently, which I think doesn't work so badly for functional eukaryotic regions, what limitations/tests or statements would you suggest to make the methodology scientifically sound?

Thank you !

prodigal exon metagenome metaeuk gene • 352 views
ADD COMMENT
0
Entering edit mode
11 hours ago
LChart 5.1k

My goal is to obtain coding fragments, exon, for functional profiling, not to reconstruct full eukaryotic gene models

It sounds to me like you want less stringency than prodigal - getORF will provide comprehensive putative ORFs. You can then subsequently filter by any number of criterion. It should be noted that prodigal really does try to construt full gene models, looking for start/stop codons across all reading frames, and then filtering on a number of heuristics. As you mentioned, noncoding sequence may be flagged by these heuristics, so prodigal may underidentify eukaryotic gene fragments. However I can't think of a great alternative that isn't reference based (prioritizing ORFs by similarity to known sequences).

ADD COMMENT
0
Entering edit mode
4 hours ago
Mensur Dlakic ★ 30k

Is this use of Prodigal as a heuristic ORF-level detector acceptable when the aim is only to quantify functional, exon part to find domain, rather than full-gene lengths?

No, this is not what Prodigal is meant for. Eukaryotic genomes have lower coding density than prokaryotic, so many superfluous gene candidates will be found during Prodigal training, which will further mess up the whole prediction. Even if this works some way when you squint really hard and pretend that using a wrong tool is OK, it will come back to bite you at some point down the road.

If I still use Prodigal insistently, which I think doesn't work so badly for functional eukaryotic regions, what limitations/tests or statements would you suggest to make the methodology scientifically sound?

Can't think of anything that would make this approach sound. A proper way is to do the binning first, classify the bins using Tiara or something similar, then focus on predicting the genes properly for eukaryotic bins.

MetaBAT2 will bin almost any dataset in under 2 minutes. Other binners listed below are slower and a bit more difficult to install. You may want to install a Meta-binner (last link) which should take care of installing everything else.

ADD COMMENT

Login before adding your answer.

Traffic: 3113 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6