Question: Velvet: What's the relation between kmer coverage and normal coverage?
1
gravatar for novice
3.7 years ago by
novice930
United States
novice930 wrote:

I'm trying to convert the kmer coverage reported in the headers of my contigs into standard coverage. Velvet's manual says the relation between kmer coverage Ck and standard coverage C is Ck = C * (L - K + 1) / L where L is the read length and k is the chosen kmer length. 

However, I tried using this formula to calculate C given Ck for each contig, then calculated the median C, i.e. standard coverage, for all the assembled contigs using my average read length, 240, and my chosen kmer parameter, 69. The result I got, 66, was different than the one reported by velvet in the Log file, 23. Do you know why this might be?
 

velvet coverage contigs assembly • 2.9k views
ADD COMMENTlink modified 3.7 years ago by Antonio R. Franco4.1k • written 3.7 years ago by novice930

It's not normal coverage, it's nucleotide coverage (C). You need to rearrange the formula to find C based on all the other info.

ADD REPLYlink written 3.7 years ago by apelin20470

That's what I did. The problem is that the median C I found is different than the C reported by velvet in the Log file as "Median coverage depth."

ADD REPLYlink written 3.7 years ago by novice930
0
gravatar for Antonio R. Franco
3.7 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.1k wrote:
I am really confused about what you have done. You need to calculate coverage C taking into account the number of total reads, their length L and the genome size. Not using contigs.. Then, you figure out Ck by using the formula And you need to calculate that before doing the assembly with velvetg, since it is a parameter required by the program
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Antonio R. Franco4.1k

Hi Antonio, I did not mean to confuse you. I'll try to explain again:

Velvet reports the coverage in two files: the Log file (Median Coverage Depth) and the contigs.fa file (in each contigs header, preceded by _cov_). Assuming both of these are kmer coverages, I supposed the median of the coverages in the contigs.fa file should be equal to the median coverage in the Log file, but it wasn't.

I then supposed that the median coverage in the Log file could be in terms of nucleotides, so I converted the coverages in the contigs.fa file into nucleotide coverages (by multiplying by (L / (L - k + 1))) and found their median. This median was again different than that reported in the Log file.

This made me confused, as you are, as to what the coverages reported in contigs.fa and the median coverage reported in the Log file actually mean, so I asked the wise online bioinformatics community for enlightenment.

ADD REPLYlink written 3.7 years ago by novice930
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 858 users visited in the last hour