Question

Velvet assembly failure at low kmers

0

Entering edit mode

9.2 years ago

a.alsheikh • 0

I've signed up to seek help in an assembly problem which I couldn't manage to resolve. Any help is very much appreciated.

I'm using Velvet to assemble 50 bacterial genomes (kmer 45-69). Average coverage is ~300x. However, usually at lower kmer values, I get zeros (i.e. failing assembly) or >1000 scaffolds. Then numbers go back to normal after kmer 50 or 53 (see example below).

Identifier     kmer     s > 200 bp     s > 200 bp     s > 200 bp        s > 200     s > 200        s > 500 bp     s > 500        s > 500           s > 500     s > 500       Mean Ins     s.d. Ins     Num N     Num N         Min N     Max N     Av N     low cov     min cov     peak cov     repeat cov
                                       total len      av len            bp N50      bp largest                    bp total len   bp av len         bp N50      bp largest    Size         Size         chars     char runs     chars     chars     chars

xxx            45       0              0              0                 0           0              0              0              0                 0           0             Given        Given        0         0             0         0         0        N\A         N\A         1            N\A
xxx            47       0              0              0                 0           0              0              0              0                 0           0             Given        Given        0         0             0         0         0        N\A         N\A         1            N\A
xxx            49       0              0              0                 0           0              0              0              0                 0           0             Given        Given        0         0             0         0         0        N\A         N\A         1            N\A
xxx            51       2560           2118246        827.43984375      1079        6472           1601           1799159        1123.77201749     1231        6472          Given        Given        0         0             0         0         0        N\A         N\A         173          N\A
xxx            53       189            2078834        10999.1216931     33558       97302          139            2063250        14843.5251799     33558       97302         Given        Given        2027      55            10        67        36.9     N\A         N\A         174          N\A
xxx            55       171            2079117        12158.5789474     35544       114031         124            2064418        16648.5322581     35544       114031        Given        Given        2967      68            10        163       43.6     N\A         N\A         139          N\A

Trouble-shooting:

I figured it might memory failure (though I'm using HPC). So I attempted to sample down to 200x, then 150x, and 100x. As I lower the coverage, failing kmers would go away (yay). However, best assembly (in terms of number of scaffolds) was at at 200x where a few failing kmers in a bunch of strain remain.

I even tried to increase the threshold of quality trimming, but that just resulted in worst assemblies.

What could be the reason for the failing kmers, and how can it be resolved? Can I just move on using the assemblies at 200x despite the few failures?

Your help is strongly appreciated.

Cheers

genomes kmer assemly bacteria velvet • 2.2k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by a.alsheikh • 0

score 1 · Answer 1 · 2015-02-01

1

Entering edit mode

9.2 years ago

Brian Bushnell 20k

Some questions:

What's your data like (read length, quality, and coverage variability, insert size)?

What is the source - single-cell, isolate, ...?

What kind of preprocessing are you doing, specifically?

When assembly fails and you have high coverage, often normalization and error-correction can help.

ADD COMMENT • link 9.2 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi Brian,

Thanks for your reply.

My genomes are illumina HiSeq, read lengths ~125bp, coverage range 200-390x, min-read-quality is 10, and average insert size is 300.

Source: isoalte

I only run it through standard QC (min-read-quality 10, min read length 70).

Will check out the normalization - thanks.

Thanks.

Areej

ADD REPLY • link 9.2 years ago by a.alsheikh • 0