I have a dataset with lots of single-celled algae, diatoms, and bacteria. I can't use my normal pipeline ([MULTIPLE BINNING PROGRAMS] -> DAS Tool -> GTDB-Tk) because the latter 2 are built specifically for prokaryotes.
How can I determine which metagenome assembled genomes [MAG] are eukaryotic? The options I know about are the following:
- EukRep - This happens at the contig level and is ML-based. I've run "positive controls" through EukRep in the past and it couldn't identify that all the contigs were eukaryotic so I don't have complete faith in this methodology. There seemed to be a high false negative rate.;
- BBSuite's sketch.sh - This worked pretty well in the past but AFAIK, it's heavily dependent on a database that isn't actively updated and really affected by the completeness of the genome (other methods could be as well);
- Mash/Sourmash - Same as above. Though, I've had more errors with these.
- BARRNAP - Identify 16S/18S then see what is more prominent. Not sure how I feel about this.
Is there something akin to GTDB-Tk but can handle eukaryotic MAGs?
To reiterate, I want to bin out my MAGs and then perform eukaryotic vs. prokaryotic classification (not on the contig level preferably). What are the recommended methods to do this?