6.4 years ago by

Australia/Perth/UWA

If you look at the VelvetOptimiser manual, you'll see that by default, it returns the contigs with the highest N50:

--k|optFuncKmer=s The optimisation function used for k-mer choice. (default 'n50').

There is a whole bunch of other functions in there for you to choose:

Advanced!: Changing the optimisation function(s)
Velvet optimiser assembly optimisation function can be built from the following variables.
LNbp = The total number of Ns in large contigs
Lbp = The total number of base pairs in large contigs
Lcon = The number of large contigs
max = The length of the longest contig
n50 = The n50
ncon = The total number of contigs
tbp = The total number of basepairs in contigs
Examples are:
'Lbp' = Just the total basepairs in contigs longer than 1kb
'n50*Lcon' = The n50 times the number of long contigs.
'n50*Lcon/tbp+log(Lbp)' = The n50 times the number of long contigs divided
by the total bases in all contigs plus the log of the number of bases
in long contigs.

You can play around with some of these formulas and calculate them manually. I'm sure no assembly will be perfect - for example, if you look for N50 alone, you might get an overlong assembly that stitched contigs together that don't belong together (i.e., chimeric contigs).

There are also other programs that use different methods to choose the best k-mer, for example, KmerGenie ( http://kmergenie.bx.psu.edu/ ) counts the k-mers in your reads and tells you which k-mer you get the most distinct genomic k-mers.

You can also run all of your assemblies through an assembly metrics tool like QUAST ( http://bioinf.spbau.ru/quast ) and judge from that output which one is best.

Have a look at the Assemblathon 1 and Assemblathon 2 papers, and see how they compare their assemblies.