Question

Metagenome binning of samples with individual assembly.

0

Entering edit mode

15 months ago

jylee ▴ 10

Hello,

I have been working with human fecal metagenome samples. The samples were individually assembled using metaSPAdes, resulting in the following files: sample1.fastq to sample1.contigs.fasta and sample2.fastq to sample2.contigs.fasta (In fact, there are more samples). I am now planning to perform binning on these contigs.

I learned that if the contigs are co-assembled, I can simply map all samples to a contigs.fasta file and use the resulting BAM files (sample1.BAM, sample2.BAM) with binning programs that use differential abundance-based methods, such as metaBAT2 or CONCOCT. However, my contigs are not co-assembled, and I am confused about the appropriate way to perform binning on the individual contigs.

Initially, I planned to concatenate all assembly files (cat *.contigs.fasta > combined.contigs.fasta) and map all reads to that assembly file before binning. However, I was concerned about possible problems with "contig redundancy" when using the concatenate method, as noted in this reply.

As a result, I have considered two other methods, but I am not sure if they are correct:

Method 1.

Map reads to only their respective assembly files (sample1.fastq to sample1.contigs.fasta /sample2.fastq to sample2.contigs.fasta).
Bin each assembly individually (sample1.contigs.fasta using sample1.coverage.txt / sample2.contigs.fasta using sample2.coverage.txt).
Concatenate the binning results using dereplication tools (e.g. dRep).

However, I am worried that the binning tool may not be able to use differential abundance-based methods because only one sample's coverage table is provided in step 2 of the Method 1.

Method 2.

Map all reads against all assemblies (sample1.fastq to both sample1.contigs.fasta and sample2.contigs.fasta / sample2.fastq to both sample1.contigs.fasta and sample2.contigs.fasta, like all-against-all mode of metagenomic workflow of Anvi'o).
Bin each assembly individually with the coverage table derived from all samples (sample1.contigs.fasta using sample1-sample1.coverage.txt and sample2-sample1.coverage.txt / sample2.contigs.fasta using sample2-sample1.coverage.txt and sample2-sample2.coverage.txt)
Concatenate the binning results using dereplication tools.

Which of these two methods is correct? Or should I consider another method? I would greatly appreciate any comments.

Thank you very much for taking the time to read this long and messy article.

Binning Metagenomics • 1.2k views

ADD COMMENT • link updated 15 months ago by Mensur Dlakic ★ 27k • written 15 months ago by jylee ▴ 10

score 3 · Accepted Answer · 2023-02-12

If you are not co-assembling, then the assemblies should be treated individually down the road. That means your method 1. You can concatenate the assemblies and pretend they were co-assembled, but that wouldn't be right. Homologous parts of your assemblies would either give an impression of higher abundance (when they are near-identical), or could appear as two very related subspecies (say, when their identity is < 95%).

The question is whether there is justification for co-assembly. If the samples were from the same individual and were taken on the same day, the co-assembly would be fine. If not, I don't think there is any justification for co-assembly. You probably know this better than me: even a 2-3 day difference in sampling can result in different gut microbiome abundance. A couple of rich meals or strong exercise in the the intervening 2-3 days can shift the microbiome profile quite a bit for the same person. This is even more pronounced for different individuals.

By the way, binning can be done without abundance, and in my experience there is not much difference. Most of the signal comes from nucleotide frequencies. It is kind of like making a rich cake with all the ingredients available, but being short one tablespoon of sugar. Most people wouldn't be able to tell the difference.

For metabat2 the abundance file is optional, and I suggest you try it both ways and see if there is any significant difference. For CONCOCT the abundance file may not be optional, but one can always create an abundance file where each contig would get an identical number, say 10. In that case abundance would not factor into binning.