Question: Velvet: What's the relation between kmer coverage and normal coverage?
1
4.3 years ago by
novice950
United States
novice950 wrote:

I'm trying to convert the kmer coverage reported in the headers of my contigs into standard coverage. Velvet's manual says the relation between kmer coverage Ck and standard coverage C is Ck = C * (L - K + 1) / L where L is the read length and k is the chosen kmer length.

However, I tried using this formula to calculate C given Ck for each contig, then calculated the median C, i.e. standard coverage, for all the assembled contigs using my average read length, 240, and my chosen kmer parameter, 69. The result I got, 66, was different than the one reported by velvet in the Log file, 23. Do you know why this might be?

velvet coverage contigs assembly • 3.2k views
ADD COMMENTlink
modified 4.3 years ago by Antonio R. Franco4.4k • written 4.3 years ago by novice950

It's not normal coverage, it's nucleotide coverage (C). You need to rearrange the formula to find C based on all the other info.

ADD REPLYlink written 4.3 years ago by apelin20470

That's what I did. The problem is that the median C I found is different than the C reported by velvet in the Log file as "Median coverage depth."

ADD REPLYlink written 4.3 years ago by novice950
0
4.3 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.4k wrote:

I am really confused about what you have done.

You need to calculate coverage C taking into account the number of total reads, their length L and the genome size. Not using contigs..

Then, you figure out Ck by using the formula

And you need to calculate that before doing the assembly with velvetg, since it is a parameter required by the program

ADD COMMENTlink modified 3 months ago by RamRS26k • written 4.3 years ago by Antonio R. Franco4.4k

Hi Antonio, I did not mean to confuse you. I'll try to explain again:

Velvet reports the coverage in two files: the Log file (Median Coverage Depth) and the contigs.fa file (in each contigs header, preceded by `_cov_`). Assuming both of these are kmer coverages, I supposed the median of the coverages in the contigs.fa file should be equal to the median coverage in the Log file, but it wasn't.

I then supposed that the median coverage in the Log file could be in terms of nucleotides, so I converted the coverages in the contigs.fa file into nucleotide coverages (by multiplying by `(L / (L - k + 1))`) and found their median. This median was again different than that reported in the Log file.

This made me confused, as you are, as to what the coverages reported in contigs.fa and the median coverage reported in the Log file actually mean, so I asked the wise online bioinformatics community for enlightenment.

ADD REPLYlink modified 3 months ago by RamRS26k • written 4.3 years ago by novice950
Please log in to add an answer.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1833 users visited in the last hour