Question

Genome Assembly From Large Insert Libraries

0

Entering edit mode

10.6 years ago

Rahul Sharma ▴ 660

Dear all,

I am trying to assemble a genome of size around 80Mb. I have four Illumina libraries of insert sizes 300bps, 1kb, 8kb and 12kb. Read lengths are 76-100bps. I generated assemblies using both Velvet and ALLPATHS_LG assemblers. I could generate the nice N50 more than 2MBs in case of velvet (K-mer=55) and ALLPATHS-LG generated N50 of around 1MB. Assembly parameters are looking nice. But I am having around 20% of Ns in the assembled scaffolds in case of velvet and around 16% in case of ALLPATHS-LG assembled scaffolds. My questions would be:

(a) Is this usual with such a long insert libraries? (b) Should I turn off the scaffolding of these assemblers and try scaffolding by other stand-alone scaffolders like BAMBUS2, SSPACE or GRASS (Please suggest more)? (c) Dose these assemblers also mask repeat elements while scaffolding/assembly process, which have been masked in the genome and I am getting high percentage of Ns?

I would really appreciate the suggestions.

Kind regards and wishes,

Rahul Sharma

velvet scaffolding illumina • 2.8k views

ADD COMMENT • link updated 10.6 years ago by cts ★ 1.7k • written 10.6 years ago by Rahul Sharma ▴ 660

score 1 · Answer 1 · 2013-09-19

sounds to me like you don't have enough coverage from the reads or that many of them are duplicates meaning that although you can link many contigs due to the long insert sizes those contigs can't actually be extended to fill in the gaps. Might be worth checking out the coverage/duplications rates of the reads.