Question

A question regarding the differential coverage binning process in a metagenomic workflow

0

Entering edit mode

2.2 years ago

Rui ▴ 50

Hello everyone,

I understand that differential coverage binning relies on a bunch of metagenomic samples taken continually over a period of time from a single location. For example, sampling the fecal microbiome for 7 days from one individual. The differential coverage binning algorithm would group contigs with a similar abundance across the samples into a single Bin.

However, what if I have samples that are not from the same location, and I only sample them at a single time point, would the differential coverage binning method still work? For example, sampling the fecal microbiome from 5 individuals at the same time. Would the binning algorithm still be able to bin contigs from these 5 metagenomes?

The reason I am asking this is that I came across a similar situation. So I have 6 soil metagenomes taken from a patch of soil from a single time point. I first assembled them individually (NOT co-assembly), next I mapped the reads from every sample back onto every assembly. For example, I mapped samples 1 reads onto samples 1 assembly, sample 2 reads onto sample 1 assembly, sample 3 reads onto sample1 assembly ….. samples 6 onto samples 1 assembly. Then I repeat this process for all samples. (Eventually ended up with 36 mapping files). I used the 6 mapping files from each sample as input for 3 binning algorithms CONCOCT, metabat2, and maxbin2, and combined the final set of bins using MetaWRAP.

As you can see, these samples are not from a times series, but rather samples taken within close proximity at a single time point, so my question is that is this a valid approach to obtain bins, are the bins obtained this way usable for downstream analyses? I know some of these binners also use nucleotide frequencies, but would that be sufficient? Sorry for the lengthy question, just wanted to give as many details as possible.

Thank you!!

assembly binning metagenomics • 1.4k views

ADD COMMENT • link updated 2.2 years ago by Mensur Dlakic ★ 27k • written 2.2 years ago by Rui ▴ 50

score 1 · Answer 1 · 2022-01-26

1

Entering edit mode

2.2 years ago

Mensur Dlakic ★ 27k

Generally speaking, binning based on tetra-nucleotide frequencies (TNF) doesn't care about DNA origin, organism abundance or time of sampling. If two DNA pieces have the same TNFs, they will be binned together even if samples were collected at different times or places. I am probably in minority with this opinion, but here goes: DNA sequences bin just fine without sequencing coverage (which is a proxy for relative abundance). There are 256 TNF features (or 136, depending on how TNFs are calculated) and coverage is only one feature. It should be fairly intuitive what contributes more to binning.

I suggest that you completely skip the coverage determination and mapping. Not sure about CONCOCT, but metabat2 definitely works without a coverage file. If your programs need a coverage file, it can be made such that the first column is contig names, and the other two columns can be uniformly filled with 1s (for coverage) and 0s (for standard deviation). Since each contig would have the same coverage and standard deviation, those features would be ignored.

I suggest you take contigs from each sample and amend their headers to reflect where they came from, and then concatenate them all into one super-meta-assembly. Next, bin this super-assembly and once bins are set, recover individual samples based on unique labels in fasta headers. I know this works for sure as I have done it for the samples from the same location that were collected in different years.

ADD COMMENT • link 2.2 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Dear Dr.Dlakic,

Thank you so much for your timely comments, I really appreciate it! My question is that when I concatenate all assemblies and do binning, if contigs from different samples are binned together, should I exclude those bins from my final bin set? Should I only include bins with contigs from the same sample?

Also, Metabat2 requires a depth file to run as indicated in this website, but it seems to be optional, I guess I can just proceed by dropping this parameter then? by running

metabat2 -i assembly.fasta -o bins_dir/bin

ADD REPLY • link 2.2 years ago by Rui ▴ 50

1

Entering edit mode

I don't know what you are trying to do, so I can't say whether you should be excluding something or not. DNA sequences from the same organism, even if they were collected at different times or different places, will bin together. If you goal is to keep such sequences in one bin, it will be automatically done for you. If your goal is to separate the sequences by sample, after binning you can do so as long as you made FASTA header unique enough so they reflect the sample they came from.