Question: How do I choose among velvet assembled contigs with different k (k-mer) values?
3
gravatar for whatup
4.6 years ago by
whatup30
United States
whatup30 wrote:

I have some contigs assembled by velvet using k-mer values ranging from 21 to 31 with step size of 2 (k-mer values: 21, 23, 25, 27, 29, 31).  Oases was also ran on theses assembled contigs since they are for transcriptome assembly.

My question is how do I go about picking the best contig out of these six contigs assembled with different k-mer values?  What is the standard criterion to pick the best?  would it be the average contig length?

When I run it with VelvetOptimiser, it doesn't return all the contigs that were assembled with different k-mer values.  It only returns one fasta file at the end.  How does it go about picking the best for its final output?

velvet contig assembly velvetg • 4.0k views
ADD COMMENTlink modified 4.6 years ago by Adrian Pelin2.2k • written 4.6 years ago by whatup30
5
gravatar for Philipp Bayer
4.6 years ago by
Philipp Bayer6.0k
Australia/Perth/UWA
Philipp Bayer6.0k wrote:

If you look at the VelvetOptimiser manual, you'll see that by default, it returns the contigs with the highest N50:

  --k|optFuncKmer=s The optimisation function used for k-mer choice. (default 'n50').

There is a whole bunch of other functions in there for you to choose:

Advanced!: Changing the optimisation function(s)

Velvet optimiser assembly optimisation function can be built from the following variables.
	LNbp = The total number of Ns in large contigs
	Lbp = The total number of base pairs in large contigs
	Lcon = The number of large contigs
	max = The length of the longest contig
	n50 = The n50
	ncon = The total number of contigs
	tbp = The total number of basepairs in contigs
Examples are:
	'Lbp' = Just the total basepairs in contigs longer than 1kb
	'n50*Lcon' = The n50 times the number of long contigs.
	'n50*Lcon/tbp+log(Lbp)' = The n50 times the number of long contigs divided
		by the total bases in all contigs plus the log of the number of bases
		in long contigs.

You can play around with some of these formulas and calculate them manually. I'm sure no assembly will be perfect - for example, if you look for N50 alone, you might get an overlong assembly that stitched contigs together that don't belong together (i.e., chimeric contigs).

There are also other programs that use different methods to choose the best k-mer, for example, KmerGenie ( http://kmergenie.bx.psu.edu/ ) counts the k-mers in your reads and tells you which k-mer you get the most distinct genomic k-mers.

You can also run all of your assemblies through an assembly metrics tool like QUAST ( http://bioinf.spbau.ru/quast ) and judge from that output which one is best.

Have a look at the Assemblathon 1 and Assemblathon 2 papers, and see how they compare their assemblies.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Philipp Bayer6.0k
3
gravatar for Adrian Pelin
4.6 years ago by
Adrian Pelin2.2k
Canada
Adrian Pelin2.2k wrote:

if these are transcriptome assemblies, just merge all the different k-mers into one assembly. Look at the oases manual.

ADD COMMENTlink written 4.6 years ago by Adrian Pelin2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 972 users visited in the last hour