Question: Judging best assembly
gravatar for deepti1rao
16 months ago by
deepti1rao20 wrote:

I used velvet to assemble genomic data of a plant and plotted a coverage histogram and a length weighted coverage histogram as suggested in the manual. Reads were 150 bp paired end, illumina. Various kmer values were tried and 115 was picked. What would be a good coverage cut off to use, considering that I have a small peak at 7. Please find 3 attachments. The expected coverage calculated by velvet is 23. When used with default coverage cut off (half of expected coverage), I get the following assembly:

N50= 21497
Max length= 185793
Total = 362 MB
No. of contigs = 48,614

I wanted to use a lower cut off to include the kmers in the smaller peak. Hence, I tried using a coverage cut off of 3, to get the following:

Nodes = 513117
N50= 20630
Max length =185793
Total = 384 MB
No. of contigs = 56,475

The expected genome size is 370-390 MB. Since it is expected to contain about 50-60% repeats, I do not expect the reads to cover my entire genome, which is also evident from my sam/bam files obtained by aligning reads to a closely related genome. I see that 10 MB is not covered.

Which among the two assemblies looks better??

kmer coverage histogram

Length weighted kmer coverage histogram

bam file coverage across reference genome of a closely related variety of the same species

ADD COMMENTlink modified 16 months ago by Rohit1.4k • written 16 months ago by deepti1rao20

I would definitely run more than one assembler preferably with multiple k-mer values and then compare the assemblies using QUAST.

ADD REPLYlink written 16 months ago by Sej Modha4.3k

You can also look at KAT ( to assess the k-mer spectra of the reads and the k-mer spectra of the assembly. Not sure if your plant has high ploidy or not. Also important would be assessing BUSCO scores for different assemblies and perhaps RNAseq data (if available) mapping rates.

ADD REPLYlink written 16 months ago by jean.elbers1.2k

This is the best option, multiple assemblers and multiple kmers. Decreasing the coverage cut-off for contiguity doesn't help as it increases changes of erroneous overlaps.

ADD REPLYlink written 16 months ago by Rohit1.4k

Hello deepti1rao,

The link you’ve added points to the page that contains the image, not the image itself. On site, right click (or Ctrl-Click on a Mac) on the image and select Copy Image Address (or an equivalent option). Use that link instead of the link you used to embed the image.

ADD REPLYlink modified 16 months ago • written 16 months ago by RamRS23k

Thanks will do next time onwards.

ADD REPLYlink written 16 months ago by deepti1rao20
gravatar for Rohit
16 months ago by
Rohit1.4k wrote:

Contig-ordering tools can help in orientation with the help of reference. However, only the contiguity of the genomes with N50, NNG50, L50, LG50 do not mean that the assembly is best. The quality of the assembly matters too, which are compared using CEGMA or BUSCO metrics. In the end, it all matters with what kind of downstream analysis is planned for your project. Also, the 10Mb missing might be due to mapping biases too, not to forget that de brujin graph based assemblers are prone to misjoins.

ADD COMMENTlink written 16 months ago by Rohit1.4k
gravatar for 5heikki
16 months ago by
5heikki8.5k wrote:

If there's a closely related genome available, why aren't you doing a reference guided assembly? Also, it might be a good idea to try different assemblers. As to your two assemblies, they're essentially same except the bigger one includes more short contigs that may or may not be "good". I would go with the first one, although I doubt choosing either one will make any difference what so ever to anything downstream..

ADD COMMENTlink written 16 months ago by 5heikki8.5k

We're not doing a reference based assembly with the reads in order not to have a reference bias at the read level itself.

Any clues as to how I can go about putting the contigs together with the help of a reference?

ADD REPLYlink written 16 months ago by deepti1rao20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1172 users visited in the last hour