Hi all,
I have been asked to perform a metagenome assembly and binning on a library generated from a sample that has been subjected to randon amplification prior library preparation. Since most metagenome binning softwares rely on coverage, I was wondering if the random PCR amplification could cause and uneven coverage profile and therefore, negativelly affect the binning step.
Thanks
That's quite uncommon. Is it a kit or home-brewed? Anyhow, you can rely on other data such as tetra-nucleotide distribution or correlation of coverage among different samples, even if the coverage of each genome varies it might still use the correlation of contigs coverages.
Not 100% sure but i think is a kit. I didn't thought of that and I will definitly try the tetra-nucleotide distribution. They decided to go for a random random PCR amplification because the starting material is extremely difficult to obtain and has a very low ammount of biological matter.
Thanks
Do you have a good reference? If so I wouldn't bother assembling and binning, pretty sure assembly will look bad.
I am curious why you wouldn't bother assembling. Other than spending some energy to run a computer - which may be running anyway - what is the downside? The poster has already said that "the starting material is extremely difficult to obtain and has a very low amount of biological matter" so they presumably understand that the assembly will not be pristine.
A. I suspect it will fail miserably and B. I wouldn't trust these assemblies due to coverage biases and weird things PCR will introduce. I think answering comparative questions can be done using a reference (if exists and is good). It all depends on the environment and what they want to achieve.
To your point (A), you could be right. Still, assembling the reads is still by far the cheapest step - both in terms of time and money - compared to all the steps they must have done so far. The point is to get the information, however biased and incomplete it may be. Which brings me to your point (B): without getting the information from the assembly, it is impossible to know if there is anything useful in there.
Here is to my understanding what they have done: used some of the sample that is "extremely difficult to obtain and has a very low amount of biological matter", amplified it and sequenced it. I don't see how not assembling it is a better idea than assembling it, notwithstanding the fact that there may be nothing useful there. I'd assemble it and interpret the results with caution.
This was just an exploratory analysis since we are planning to perform a long read sequencing. Hi guess the problem was the assembly since after playing with bbnorm (several contigs had an absurdly high coverage i.e. > 1000X), I have noticed a neat increase in the N50 but a lower number of contigs. The binning step with MetaBAT, MaxBin2 and CONCOT was also better. I got a lower number of bins, but the quality of each bin was far better in terms of size, completeness (> 94%) and redundancy (< 5%). I will the tetra-nucleoitde distribution to see if there is any improvement, but at this point I guess the problem was the assembly step.
Unfortunately we do not have a good reference. This is why we are planning a long read sequencing.
Thanks!
MetaBAT actually uses tetranucleotide distribution. Long reads requires far more DNA than short reads, people are struggling to get enough DNA even with "normal" tissues.