How To Choose The K Value Of Kmer In Soapdenovo?
3
9
Entering edit mode
11.4 years ago
Dejian ★ 1.3k

I have read the papers about EULER, velvet and soapdenovo, but I am still confused about how to choose the K values. It is a common practice to test several K values and choose the best one among them according to the results. But I think there may be some clues indicating the proper range of K and the K values should not be tested blindly. For example, obviously the K should be less than the maximal length of the reads. Is there a way to estimate roughtly the proper range of K values according to the genome size, sequenceing depth, reads length or something else? How do you choose the K value? Many thanks.

• 21k views
ADD COMMENT
9
Entering edit mode
11.4 years ago
Rm 8.3k

Generally it is between half to 2/3rd of the read length; Too small will lead to many short contigs, whereas longer kmer will result in few long contigs.

ADD COMMENT
3
Entering edit mode

And then, may need to perform several trial runs with different K-mer around, and select the best one

ADD REPLY
1
Entering edit mode

How do I know which K-mer gives the better results? Thanks!

ADD REPLY
2
Entering edit mode

After assembly, you will calculate some statistics such as contig N50 N90, scaffold N50 N90, and total scaffold length. Usually, a better Kmer gives larger contig/scaffold N50/90 values. But the total scaffold length should not deviate too much from the estimated genome size (You should estimate the genome size using an experimental method such as flow cytometry).

ADD REPLY
1
Entering edit mode

Quite reasonable. Many thanks.

ADD REPLY
0
Entering edit mode

I thought in a perfect experiment, we'd want a single contig that covers the whole genome. Why "longer kmer will result in few long contigs" is a bad thing?

ADD REPLY
0
Entering edit mode

Imperfect coverage and sequencing errors.. Sufficiently many error-free k-mers need to cover each position in a contig. Take a look at the kmergenie paper for a longer discussion.

ADD REPLY
0
Entering edit mode

Is it really true? Shorter kmers will unable overgo repetitions of the same or longer length, but on the other hand it help you to guild more dense graph (basically two reads will be in connected in graph only in the case of overlap of the size of kmer). Therefore I guess it depens a lot on coverage you have, lower coverage you have the smaller kmer you have to choose because otherwise even non complex regions wont be resolved.

ADD REPLY
4
Entering edit mode
11.4 years ago

I suggest to have a look at VelvetOptimiser :

VelvetOptimiser is a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -expcov, -covcutoff) for the Velvet de novo sequence assembler.

ADD COMMENT
0
Entering edit mode

Thanks for you advice, Frederic. SOAPdenovo shares basic rationale with Velvet. Your suggestion should be helpful for K value selection in soapdenovo. I will check it.

ADD REPLY
2
Entering edit mode
10.3 years ago
Hranjeev ★ 1.5k

You may estimate the best kmer using the kmer frequency table. One of the programs that do this specifically is kmergenie. I''ve used the tool earlier but the results were not too promising but perhaps the updates in the software may have improved things a bit.

Wished it had a k-mer best estimate for error correcting reads as well.

Pros: You get a k-mer that you can focus on for assembly (obviously).

Cons: The running of the program itself takes quite a while.

ADD COMMENT
1
Entering edit mode

Hi, kmergenie dev here. Indeed, we're continuously improving the software, and it normally works well for our users. I encourage you to try a latest version and email me (kmergenie@cse.psu.edu) if you get unsatisfactory results. Your feedback can help us identify problems we were not aware of. Thanks.

ADD REPLY
0
Entering edit mode

Hi,

I get some error when using kmergenie, link of question is kmergenie [OSError: [Errno 2] No such file or directory]

Thanks!

ADD REPLY
0
Entering edit mode

Does one combine forward and reverse reads into one file to run kmergenie? Is it possible to use separate files?

ADD REPLY

Login before adding your answer.

Traffic: 2440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6