bbcms vs bbnorm when working with highly uneven coverage
1
2
Entering edit mode
3.0 years ago

Hi, when working with highly uneven metagenomic datasets (e.g. soil), where coverage is extremely high for a few dominant organisms (which creates problems during assembly due to sequencing errors adding erroneous edges to the graph), and relatively low for the others, do you use bbnorm to normalize read depth or bbcms for depth filtering? Or both?

BBTools Metagenomics Megahit Assembly BBNorm • 1.7k views
ADD COMMENT
1
Entering edit mode
3.0 years ago
GenoMax 141k

This is a question that would benefit if Brian was around to answer. He says the following in BBNorm guide

Normalizes read depth based on kmer counts. Can also error-correct, bin reads by kmer depth, and generate a kmer depth histogram. Normalization is often useful if you have too much data (for example, 600x average coverage when you only want 100x) or uneven coverage (amplified single-cell, RNA-seq, viruses, metagenomes, etc).

I would say use bbnorm.sh before bbcms.sh because latter is filtering based on depth.

Error corrects reads and/or filters by depth, storing kmer counts in a count-min sketch (a Bloom filter variant).

ADD COMMENT
0
Entering edit mode

I have a follow-up question to this. I am assembling contigs from my metagenome reads and then annotating the assembled contigs to determine the functional pool of my sample rather than going into genome binning. I normalized my reads before error correction and assembly, which i know is the right choice for building MAGs...but if I am just interested in the functional pool, should I wait to noramlize until after annotation, when I want to determine the depth of coverage for my functions of interest? Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6