Question

How do I choose among velvet assembled contigs with different k (k-mer) values?

3

Entering edit mode

9.6 years ago

whatup ▴ 30

I have some contigs assembled by velvet using k-mer values ranging from 21 to 31 with step size of 2 (k-mer values: 21, 23, 25, 27, 29, 31). Oases was also ran on theses assembled contigs since they are for transcriptome assembly.

My question is how do I go about picking the best contig out of these six contigs assembled with different k-mer values? What is the standard criterion to pick the best? would it be the average contig length?

When I run it with VelvetOptimiser, it doesn't return all the contigs that were assembled with different k-mer values. It only returns one fasta file at the end. How does it go about picking the best for its final output?

velvetg assembly contig velvet • 5.9k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by whatup ▴ 30

score 5 · Answer 1 · 2014-09-11

If you look at the VelvetOptimiser manual, you'll see that by default, it returns the contigs with the highest N50:

  --k|optFuncKmer=s The optimisation function used for k-mer choice. (default 'n50').

There is a whole bunch of other functions in there for you to choose:

Advanced!: Changing the optimisation function(s)

Velvet optimiser assembly optimisation function can be built from the following variables.
	LNbp = The total number of Ns in large contigs
	Lbp = The total number of base pairs in large contigs
	Lcon = The number of large contigs
	max = The length of the longest contig
	n50 = The n50
	ncon = The total number of contigs
	tbp = The total number of basepairs in contigs
Examples are:
	'Lbp' = Just the total basepairs in contigs longer than 1kb
	'n50*Lcon' = The n50 times the number of long contigs.
	'n50*Lcon/tbp+log(Lbp)' = The n50 times the number of long contigs divided
		by the total bases in all contigs plus the log of the number of bases
		in long contigs.

You can play around with some of these formulas and calculate them manually. I'm sure no assembly will be perfect - for example, if you look for N50 alone, you might get an overlong assembly that stitched contigs together that don't belong together (i.e., chimeric contigs).

There are also other programs that use different methods to choose the best k-mer, for example, KmerGenie ( http://kmergenie.bx.psu.edu/ ) counts the k-mers in your reads and tells you which k-mer you get the most distinct genomic k-mers.

You can also run all of your assemblies through an assembly metrics tool like QUAST ( http://bioinf.spbau.ru/quast ) and judge from that output which one is best.

Have a look at the Assemblathon 1 and Assemblathon 2 papers, and see how they compare their assemblies.

score 3 · Answer 2 · 2014-09-12

3

Entering edit mode

9.6 years ago

Adrian Pelin ★ 2.6k

if these are transcriptome assemblies, just merge all the different k-mers into one assembly. Look at the oases manual.

ADD COMMENT • link 9.6 years ago by Adrian Pelin ★ 2.6k