metagenome binning using MaxBin2 for each dataset
0
0
Entering edit mode
2 days ago
serene.s • 0

Hi! In the documentation of MaxBin2 it is written that it is used for binning of the co-assembly of many metagenomic datasets. But I did assembly for individual sample using MegaHit and the final contig from each sample is used as input for the MaxBin. Is it wrong to use the contig from single dataset assembly as input for the maxbin? or it is only correct when I am giving the co-assembly results to maxbin for binning?

Metagenome Binning MaxBin • 146 views
0
Entering edit mode

While the authors claim that you get better results with co-assembly, you can still use MaxBin2 for single sample assemblies.

Other binning tools you should look at are MetaBAT2 and semibin. In my experience these tools outperform maxbin2.

Alternatively, you could try the Bin_refinement module from metaWRAP:

The metaWRAP::Bin_refinement module utilizes a hybrid approach to take in two or three bin sets that were obtained with different software (or the same software with different parameters) and produces a consolidated, improved bin set. First, binning_refiner is used to cready hybridized bins from every possible combination of sets. If there were three bin sets: A, B, and C, then the following hybrid sets will be produced with binning_refiner: AB, BC, AC, and ABC. CheckM is then run to evaluate the completion and contamination of the bins in each of the 7 bin sets (3 originals, 4 hybridized). The bins sets are then iteratively compared to each other, and each pair is consolidated into an improved bin set. To do this, the same bin is identified within the two bin sets based on a minimum of 80% overlap in genome length, and the better bin is determined based on which bin has the higher score. The scoring function is S=Completion-5*Contamination. After all bin sets are incorporated into the consolidated bin collection, a de-replication function removes any duplicate contigs. If a contig is present in more than one bin, it is removed from all but the best bin (based on scoring function). CheckM is then run on the final bin set and a final report file is generated showing the completion, contamination, and other statistics generated by CheckM for each bin. Completion and contamination rank plots are also generated to evaluate the success of the Bin_refinement module, and compare its output to the quality of the original bins. (source)