How To Set -Max_Coverage In Velvetg
1
0
Entering edit mode
11.6 years ago
rwn ▴ 610

Hello,

I am assembling bacterial genomes (~6Mb) using 250 bp paired-end MiSeq data. I have tried a bunch of assemblers (idba_ud, mira, ray, SOAPdenovo, ABySS to name a few...), but am getting reasonably good results using good old velvet (~360 contigs, n50 = 40kb). But I have a question about how to set the velvetg parameter -max_coverage? It's value has a large effect on the resulting number of contigs and total number of bases in the assembly (ie assembled genome size). Am I correct in thinking that many of these high-coverage nodes errors (or at least error-prone, like repeat elements etc) and should be excluded for a better assembly?

I estimate the coverage distribution (in R using plotrix) from the stats.txt file after running a preliminary: velvetg velvet_big_127 -cov_cutoff auto -exp_cov auto. It is then easy to calculate the weighted mean coverage -exp_cutoff and to set a reasonable value for -cov_cutoff, but there is often a long tail in the distribution meaning that there are small number of nodes with very high coverage.

Generally, what is a good way to determine a sensible value for -max_coverage?

Many thanks! Reuben

cov_k131

velvet genomics • 3.0k views
ADD COMMENT
2
Entering edit mode
11.1 years ago
Torst ▴ 980

You shouldn't usually set the -max_coverage to anything, unless you know there is "contamination" in your sample in a higher ratio to what your actual true sample is you are trying to recover. Then you would use it as a "low pass filter". Another scenario is if you have a plasmid with high copy number relative to your chromosome. You could use -max_coverage to filter out the plasmid reads. Then you could use -cov_cutoff etc to do the opposite to recover the plasmid. But in general, setting -max_coverage will remove repeat elements from your assembly only. Although this may increase your metrics like N and N50, it is artificial, as what is left over is the same as what was there before, but without the repeated contigs.

ADD COMMENT

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6