Question: What Happens When The K-Mer Size Is Larger Than The Trimmed Reads Size In Velvet Assembly?
gravatar for Rahul Sharma
7.2 years ago by
Rahul Sharma600
Rahul Sharma600 wrote:

Hi all,

I am assembling a genome of size 120Mb from 5 different libraries of different inserts. Insert sizes are 300bp, 1Kb, 8Kbs, 20kbs and singletons. first two libraries are from Illumina genome analyzer(Read length: 76bp) and the last two are from HiSeq (Read length: 100bp). After reads trimming mean lengths are 55 and 87bp from GA and HiSeq runs. I want to do assemblies with velvet, would the k-mer size of 35, 45, 55, 65, 75 will crate any issue? Since my trimmed read length is quite varying? Will it be fine to assemble both GA and HiSeq reads together or should I assemble separately and merge assemblies later? I would appreciate the decent comments.


velvet • 5.1k views
ADD COMMENTlink modified 6.6 years ago by SES8.3k • written 7.2 years ago by Rahul Sharma600
gravatar for Istvan Albert
7.2 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

I don't know first hand but I recall people stating that it can't work as the method won't be able to build the kmers that are long enough.

Stated for example in a blog post from Homologous:

More relevant overall information on k and other parameters can be found in Titus Brown's blog:

In fact all pages tagged as assembly are worth consulting:

ADD COMMENTlink written 7.2 years ago by Istvan Albert ♦♦ 84k
gravatar for SES
6.6 years ago by
Vancouver, BC
SES8.3k wrote:

I was curious about this because I use velvet a lot, so I tested it. There is no explicit warning from velveth, but you can tell there were no overlaps found by a couple of ways. First, look at the Roadmaps file. If you choose a k-mer size larger than your read lengths, then the Roadmaps found will be equal to the input sequence number. Another way would be to just run velvetg and take a look at the graph produced. If it runs rather quickly and ends with something like:

[155.488308] EMPTY GRAPH
Final graph has 0 nodes and n50 of 0, max 0, total 0, using 0/20198538 reads

Then you have a clear indication no overlaps were found for that hash length. Because read lengths vary, I think that all the sequences would have to be processed in order to warn about these conditions. Though, it would probably be helpful to warn about this after the pre-processing stage or fall back to a hash length shorter than the reads before working on the Roadmaps.

ADD COMMENTlink written 6.6 years ago by SES8.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 742 users visited in the last hour