Question: How To Set -Max_Coverage In Velvetg
gravatar for rwn
7.6 years ago by
United Kingdom
rwn520 wrote:


I am assembling bacterial genomes (~6Mb) using 250 bp paired-end MiSeq data. I have tried a bunch of assemblers (idba_ud, mira, ray, SOAPdenovo, ABySS to name a few...), but am getting reasonably good results using good old velvet (~360 contigs, n50 = 40kb). But I have a question about how to set the velvetg parameter -max_coverage? It's value has a large effect on the resulting number of contigs and total number of bases in the assembly (ie assembled genome size). Am I correct in thinking that many of these high-coverage nodes errors (or at least error-prone, like repeat elements etc) and should be excluded for a better assembly?

I estimate the coverage distribution (in R using plotrix) from the stats.txt file after running a preliminary: velvetg velvet_big_127 -cov_cutoff auto -exp_cov auto. It is then easy to calculate the weighted mean coverage -exp_cutoff and to set a reasonable value for -cov_cutoff, but there is often a long tail in the distribution meaning that there are small number of nodes with very high coverage.

Generally, what is a good way to determine a sensible value for -max_coverage?

Many thanks! Reuben


velvet genomics • 2.2k views
ADD COMMENTlink modified 7.2 years ago by Torst960 • written 7.6 years ago by rwn520
gravatar for Torst
7.2 years ago by
Torst960 wrote:

You shouldn't usually set the -max_coverage to anything, unless you know there is "contamination" in your sample in a higher ratio to what your actual true sample is you are trying to recover. Then you would use it as a "low pass filter". Another scenario is if you have a plasmid with high copy number relative to your chromosome. You could use -max_coverage to filter out the plasmid reads. Then you could use -cov_cutoff etc to do the opposite to recover the plasmid. But in general, setting -max_coverage will remove repeat elements from your assembly only. Although this may increase your metrics like N and N50, it is artificial, as what is left over is the same as what was there before, but without the repeated contigs.

ADD COMMENTlink written 7.2 years ago by Torst960
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1359 users visited in the last hour