Question

bias introduce during metagenome binning due to random amplification prior to library preparation

0

Entering edit mode

3.8 years ago

andres.firrincieli 3.6k

Hi all,

I have been asked to perform a metagenome assembly and binning on a library generated from a sample that has been subjected to randon amplification prior library preparation. Since most metagenome binning softwares rely on coverage, I was wondering if the random PCR amplification could cause and uneven coverage profile and therefore, negativelly affect the binning step.

Thanks

metagenome binning random amplification • 966 views

ADD COMMENT • link updated 3.8 years ago by Mensur Dlakic ★ 27k • written 3.8 years ago by andres.firrincieli 3.6k

1

Entering edit mode

That's quite uncommon. Is it a kit or home-brewed? Anyhow, you can rely on other data such as tetra-nucleotide distribution or correlation of coverage among different samples, even if the coverage of each genome varies it might still use the correlation of contigs coverages.

ADD REPLY • link 3.8 years ago by Asaf 10k

0

Entering edit mode

Not 100% sure but i think is a kit. I didn't thought of that and I will definitly try the tetra-nucleotide distribution. They decided to go for a random random PCR amplification because the starting material is extremely difficult to obtain and has a very low ammount of biological matter.

Thanks

ADD REPLY • link 3.8 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Do you have a good reference? If so I wouldn't bother assembling and binning, pretty sure assembly will look bad.

ADD REPLY • link 3.8 years ago by Asaf 10k

0

Entering edit mode

I am curious why you wouldn't bother assembling. Other than spending some energy to run a computer - which may be running anyway - what is the downside? The poster has already said that "the starting material is extremely difficult to obtain and has a very low amount of biological matter" so they presumably understand that the assembly will not be pristine.

ADD REPLY • link 3.8 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

A. I suspect it will fail miserably and B. I wouldn't trust these assemblies due to coverage biases and weird things PCR will introduce. I think answering comparative questions can be done using a reference (if exists and is good). It all depends on the environment and what they want to achieve.

ADD REPLY • link 3.8 years ago by Asaf 10k

0

Entering edit mode

To your point (A), you could be right. Still, assembling the reads is still by far the cheapest step - both in terms of time and money - compared to all the steps they must have done so far. The point is to get the information, however biased and incomplete it may be. Which brings me to your point (B): without getting the information from the assembly, it is impossible to know if there is anything useful in there.

Here is to my understanding what they have done: used some of the sample that is "extremely difficult to obtain and has a very low amount of biological matter", amplified it and sequenced it. I don't see how not assembling it is a better idea than assembling it, notwithstanding the fact that there may be nothing useful there. I'd assemble it and interpret the results with caution.

ADD REPLY • link 3.8 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

This was just an exploratory analysis since we are planning to perform a long read sequencing. Hi guess the problem was the assembly since after playing with bbnorm (several contigs had an absurdly high coverage i.e. > 1000X), I have noticed a neat increase in the N50 but a lower number of contigs. The binning step with MetaBAT, MaxBin2 and CONCOT was also better. I got a lower number of bins, but the quality of each bin was far better in terms of size, completeness (> 94%) and redundancy (< 5%). I will the tetra-nucleoitde distribution to see if there is any improvement, but at this point I guess the problem was the assembly step.

Unfortunately we do not have a good reference. This is why we are planning a long read sequencing.

Thanks!

ADD REPLY • link 3.8 years ago by andres.firrincieli 3.6k

0

Entering edit mode

MetaBAT actually uses tetranucleotide distribution. Long reads requires far more DNA than short reads, people are struggling to get enough DNA even with "normal" tissues.

ADD REPLY • link 3.8 years ago by Asaf 10k

score 2 · Accepted Answer · 2020-06-14

2

Entering edit mode

3.8 years ago

Mensur Dlakic ★ 27k

I know of at least couple of binning programs that do not use coverage information for binning:

Even for tools that do, like CONCOCT, coverage is just one of hundreds of features, so random amplification should not completely compromise the process. I often compare binning with and without coverage, and I do not observe any systematic difference beyond what is normally introduced by using different software implementations.

A list of several binning programs other than those above is here.

ADD COMMENT • link 3.8 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Thanks Mensur.

I will definitely try more than one binning strategy. I made some improvement with bbnorm since several contigs had a very high coverage.

ADD REPLY • link 3.8 years ago by andres.firrincieli 3.6k