Hello,
I have been working with human fecal metagenome samples. The samples were individually assembled using metaSPAdes, resulting in the following files: sample1.fastq
to sample1.contigs.fasta
and sample2.fastq to sample2.contigs.fasta
(In fact, there are more samples). I am now planning to perform binning on these contigs.
I learned that if the contigs are co-assembled, I can simply map all samples to a contigs.fasta file and use the resulting BAM files (sample1.BAM
, sample2.BAM
) with binning programs that use differential abundance-based methods, such as metaBAT2 or CONCOCT. However, my contigs are not co-assembled, and I am confused about the appropriate way to perform binning on the individual contigs.
Initially, I planned to concatenate all assembly files (cat *.contigs.fasta > combined.contigs.fasta
) and map all reads to that assembly file before binning. However, I was concerned about possible problems with "contig redundancy" when using the concatenate method, as noted in this reply.
As a result, I have considered two other methods, but I am not sure if they are correct:
Method 1.
- Map reads to only their respective assembly files (
sample1.fastq
tosample1.contigs.fasta
/sample2.fastq
tosample2.contigs.fasta
). - Bin each assembly individually (
sample1.contigs.fasta
usingsample1.coverage.txt
/sample2.contigs.fasta
usingsample2.coverage.txt
). - Concatenate the binning results using dereplication tools (e.g. dRep).
However, I am worried that the binning tool may not be able to use differential abundance-based methods because only one sample's coverage table is provided in step 2 of the Method 1.
Method 2.
- Map all reads against all assemblies (
sample1.fastq
to bothsample1.contigs.fasta
andsample2.contigs.fasta
/sample2.fastq
to bothsample1.contigs.fasta
andsample2.contigs.fasta
, like all-against-all mode of metagenomic workflow of Anvi'o). - Bin each assembly individually with the coverage table derived from all samples (
sample1.contigs.fasta
usingsample1-sample1.coverage.txt
andsample2-sample1.coverage.txt
/sample2.contigs.fasta
usingsample2-sample1.coverage.txt
andsample2-sample2.coverage.txt
) - Concatenate the binning results using dereplication tools.
Which of these two methods is correct? Or should I consider another method? I would greatly appreciate any comments.
Thank you very much for taking the time to read this long and messy article.
Thank you for your clear explanation! As you advised, I will go ahead with the binning of my assemblies separately.
Thank you again for your help.
It is common to upvote and/or accept the answer it if solved your problem.
I apologize for my mistake. As a newcomer to Biostars, I must have missed it.