Hello members, I would like to know that is there any guidelines to choose K-mer size for Debruijn graph based assembly (2nd generation sequencing reads). I have F.vesca data set in which total number of reads is 12803137 and on an average length of each read is 353 bp. So, I would like to know that what is the best kmer size for assembling these many reads of F.Vesca. Can i try with any k-value above 100 in this case? Thanks.
Which assembler are you considering? Has your data been preprocessed somehow?
Dear h.mon, Dataset is not preprocessed already. What should be the k-mer size depends on the assembler? If so, as of now I have velvet installed in my computer.So, i will use velvet assembler.
The k-mer size is obviously limited by your read lengths, i.e., you cannot have a k-mer that's longer than your read length.
The k-mer size is somewhat independent of the assembler and more to do with your read-lengths, I would imagine. People typically think around the 30-40 range, but with higher k-mers you can achieve a more comprehensive assembled genome (at a computational expense).