Question: How To Set -Max_Coverage In Velvetg
0
gravatar for rwn
6.0 years ago by
rwn460
United Kingdom
rwn460 wrote:

Hello,

I am assembling bacterial genomes (~6Mb) using 250 bp paired-end MiSeq data. I have tried a bunch of assemblers (idba_ud, mira, ray, SOAPdenovo, ABySS to name a few...), but am getting reasonably good results using good old velvet (~360 contigs, n50 = 40kb). But I have a question about how to set the velvetg parameter -max_coverage? It's value has a large effect on the resulting number of contigs and total number of bases in the assembly (ie assembled genome size). Am I correct in thinking that many of these high-coverage nodes errors (or at least error-prone, like repeat elements etc) and should be excluded for a better assembly?

I estimate the coverage distribution (in R using plotrix) from the stats.txt file after running a preliminary: velvetg velvet_big_127 -cov_cutoff auto -exp_cov auto. It is then easy to calculate the weighted mean coverage -exp_cutoff and to set a reasonable value for -cov_cutoff, but there is often a long tail in the distribution meaning that there are small number of nodes with very high coverage.

Generally, what is a good way to determine a sensible value for -max_coverage?

Many thanks! Reuben

cov_k131

velvet genomics • 1.9k views
ADD COMMENTlink modified 5.6 years ago by Torst900 • written 6.0 years ago by rwn460
2
gravatar for Torst
5.6 years ago by
Torst900
Australia
Torst900 wrote:

You shouldn't usually set the -max_coverage to anything, unless you know there is "contamination" in your sample in a higher ratio to what your actual true sample is you are trying to recover. Then you would use it as a "low pass filter". Another scenario is if you have a plasmid with high copy number relative to your chromosome. You could use -max_coverage to filter out the plasmid reads. Then you could use -cov_cutoff etc to do the opposite to recover the plasmid. But in general, setting -max_coverage will remove repeat elements from your assembly only. Although this may increase your metrics like N and N50, it is artificial, as what is left over is the same as what was there before, but without the repeated contigs.

ADD COMMENTlink written 5.6 years ago by Torst900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1020 users visited in the last hour