Question: Genome Assembly From Large Insert Libraries
gravatar for Rahul Sharma
5.2 years ago by
Rahul Sharma560
Germany and India
Rahul Sharma560 wrote:

Dear all,

I am trying to assemble a genome of size around 80Mb. I have four Illumina libraries of insert sizes 300bps, 1kb, 8kb and 12kb. Read lengths are 76-100bps. I generated assemblies using both Velvet and ALLPATHS_LG assemblers. I could generate the nice N50 more than 2MBs in case of velvet (K-mer=55) and ALLPATHS-LG generated N50 of around 1MB. Assembly parameters are looking nice. But I am having around 20% of Ns in the assembled scaffolds in case of velvet and around 16% in case of ALLPATHS-LG assembled scaffolds. My questions would be:

(a) Is this usual with such a long insert libraries? (b) Should I turn off the scaffolding of these assemblers and try scaffolding by other stand-alone scaffolders like BAMBUS2, SSPACE or GRASS (Please suggest more)? (c) Dose these assemblers also mask repeat elements while scaffolding/assembly process, which have been masked in the genome and I am getting high percentage of Ns?

I would really appreciate the suggestions.

Kind regards and wishes,

Rahul Sharma

illumina velvet scaffolding • 1.7k views
ADD COMMENTlink modified 5.2 years ago by cts1.6k • written 5.2 years ago by Rahul Sharma560
gravatar for cts
5.2 years ago by
cts1.6k wrote:

sounds to me like you don't have enough coverage from the reads or that many of them are duplicates meaning that although you can link many contigs due to the long insert sizes those contigs can't actually be extended to fill in the gaps. Might be worth checking out the coverage/duplications rates of the reads.

ADD COMMENTlink written 5.2 years ago by cts1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1444 users visited in the last hour