Question: Velvet assembly failure at low kmers
0
gravatar for a.alsheikh
4.9 years ago by
a.alsheikh0
Australia
a.alsheikh0 wrote:

I've signed up to seek help in an assembly problem which I couldn't manage to resolve. Any help is very much appreciated.

I'm using Velvet to assemble 50 bacterial genomes (kmer 45-69). Average coverage is ~300x. However, usually at lower kmer values, I get zeros (i.e. failing assembly) or >1000 scaffolds. Then numbers go back to normal after kmer 50 or 53 (see example below).


Identifier kmer s > 200 bp s > 200 bp total len s > 200 bp av len s > 200 bp N50 s > 200 bp largest s > 500 bp s > 500 bp total len s > 500 bp av len s > 500 bp N50 s > 500 bp largest Mean Ins Size s.d. Ins Size Num N chars Num N char runs Min N chars Max N chars Av N chars low cov min cov peak cov repeat cov
xxx 45 0 0 0 0 0 0 0 0 0 0 Given Given 0 0 0 0 0 N\A N\A 1 N\A
xxx 47 0 0 0 0 0 0 0 0 0 0 Given Given 0 0 0 0 0 N\A N\A 1 N\A
xxx 49 0 0 0 0 0 0 0 0 0 0 Given Given 0 0 0 0 0 N\A N\A 1 N\A
xxx 51 2560 2118246 827.43984375 1079 6472 1601 1799159 1123.77201749 1231 6472 Given Given 0 0 0 0 0 N\A N\A 173 N\A
xxx 53 189 2078834 10999.1216931 33558 97302 139 2063250 14843.5251799 33558 97302 Given Given 2027 55 10 67 36.9 N\A N\A 174 N\A
xxx 55 171 2079117 12158.5789474 35544 114031 124 2064418 16648.5322581 35544 114031 Given Given 2967 68 10 163 43.6 N\A N\A 139 N\A

Trouble-shooting:
I figured it might memory failure (though I'm using HPC). So I attempted to sample down to 200x, then 150x, and 100x. As I lower the coverage, failing kmers would go away (yay). However, best assembly (in terms of number of scaffolds) was at at 200x where a few failing kmers in a bunch of strain remain.

I even tried to increase the threshold of quality trimming, but that just resulted in worst assemblies.

What could be the reason for the failing kmers, and how can it be resolved? Can I just move on using the assemblies at 200x despite the few failures?

Your help is strongly appreciated.

Cheers,

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by a.alsheikh0
1
gravatar for Brian Bushnell
4.9 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Some questions:

What's your data like (read length, quality, and coverage variability, insert size)?

What is the source - single-cell, isolate, ...?

What kind of preprocessing are you doing, specifically?

When assembly fails and you have high coverage, often normalization and error-correction can help.

ADD COMMENTlink written 4.9 years ago by Brian Bushnell17k

Hi Brian,

Thanks for your reply.

My genomes are illumina HiSeq, read lengths ~125bp, coverage range 200-390x, min-read-quality is 10, and average insert size is 300.

Source: isoalte

I only run it through standard QC (min-read-quality 10, min read length 70).

Will check out the normalization - thanks.

Thanks.

Areej

ADD REPLYlink written 4.9 years ago by a.alsheikh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1043 users visited in the last hour