I've signed up to seek help in an assembly problem which I couldn't manage to resolve. Any help is very much appreciated.
I'm using Velvet to assemble 50 bacterial genomes (kmer 45-69). Average coverage is ~300x. However, usually at lower kmer values, I get zeros (i.e. failing assembly) or >1000 scaffolds. Then numbers go back to normal after kmer 50 or 53 (see example below).
|Identifier||kmer||s > 200 bp||s > 200 bp total len||s > 200 bp av len||s > 200 bp N50||s > 200 bp largest||s > 500 bp||s > 500 bp total len||s > 500 bp av len||s > 500 bp N50||s > 500 bp largest||Mean Ins Size||s.d. Ins Size||Num N chars||Num N char runs||Min N chars||Max N chars||Av N chars||low cov||min cov||peak cov||repeat cov|
I figured it might memory failure (though I'm using HPC). So I attempted to sample down to 200x, then 150x, and 100x. As I lower the coverage, failing kmers would go away (yay). However, best assembly (in terms of number of scaffolds) was at at 200x where a few failing kmers in a bunch of strain remain.
I even tried to increase the threshold of quality trimming, but that just resulted in worst assemblies.
What could be the reason for the failing kmers, and how can it be resolved? Can I just move on using the assemblies at 200x despite the few failures?
Your help is strongly appreciated.