Question: How To Close Gaps In A 454 Assembly In Silico?
gravatar for Michael Barton
8.7 years ago by
Michael Barton1.8k
Akron, Ohio, United States
Michael Barton1.8k wrote:

We've sequenced two ~7-9Mbp microbial genomes using 454 which was subsequently assembled with newbler. For the first bacteria we have 8 sequence scaffolds. These scaffolds contain gap regions which I assumed were the result of when the sequencing coverage dropped off. However when I look at the read depth for these regions the contig appears to terminate prematurely while there is still a large amount of read depth. I assume that these reads could still continue off the end of the contig but they have been ignored. I've been reading the newbler documentation and it seems to indicate that contig extension stops when there are repeats in the genome.

Can anyone offer any help on how we can close these scaffold gaps in silico? It's seems that we should have the sequence data to get across but I don't know how to do it.

genome assembly sequencing • 5.0k views
ADD COMMENTlink written 8.7 years ago by Michael Barton1.8k
gravatar for Bioinfo
8.7 years ago by
DC Metro Area
Bioinfo330 wrote:


Welcome to the wonderful world of genome finishing. If the repeats are longer than the length of a read (300-600) for flx titanium (ballpark), you will not be able to span it. These areas may also be caused by homopolymer issues that this platform suffers from, or other mysterious artifacts. One option it to use software like CONSED or CLC Bio to visualize the areas, and work your way into the repeats by finding reads that are anchored in unique sequencer. Designing primers that span the areas and using Sanger sequencing may also be helpful. I assume you don't have a reference of any type to use in piecing things together?

You can also run a differ assembler and then do a mummer mapping to see if any of the areas were taken care of by the other assembler, you would be amazed at how different assemblers handle the same data differently.

ADD COMMENTlink written 8.7 years ago by Bioinfo330

Thanks for the suggestions. The gaps are between 500-1000bp so it looks like the sequence data won't span these gaps because of the repeats in the genome. We do have a reference strain from the same species but there seems to be lot of recombination between the two genomes. I guess it's worth a look for some of the regions which look like there is no recombination. I tried AMOScmp as an alternative assembler but this produced a much large number of contigs compared with newbler.

I'll try consed and autofinish too but I'm still waiting for the software.

ADD REPLYlink written 8.7 years ago by Michael Barton1.8k
gravatar for Wjeck
8.7 years ago by
Chapel Hill, NC
Wjeck480 wrote:

Generally these gaps are very tricky to span, even with 454 reads, using in silico techniques only. You might have to try the wet bench solution to this, which is to use illumina PE reads with a large "insert" size to create a scaffold that jumps those gaps.

There's this project using that technique (shameless self promotion):

But I think others have made considerable improvements since then.

ADD COMMENTlink modified 8.7 years ago • written 8.7 years ago by Wjeck480

Thanks for the suggestions. We're considering SRS for a second genome we have which is even more fragmented >50 contigs at X17 coverage. Probably a large number of repeats ...

ADD REPLYlink written 8.7 years ago by Michael Barton1.8k
gravatar for lexnederbragt
8.7 years ago by
Oslo, Norway
lexnederbragt1.2k wrote:

In this PDF:

on page 2, there is a program mentioned to close gaps in 454 assemblies. We tried it out on a bacterial genome, and it seems to work for a subset of the gaps in the scaffolds. We are currently quality checking the closed gaps...

ADD COMMENTlink written 8.7 years ago by lexnederbragt1.2k

Thanks. That looks useful. How are you quality checking the gaps?

ADD REPLYlink written 8.7 years ago by Michael Barton1.8k

If you really must know :-) we have early access to the graph viewer, and use that to check which contigs (according to the graph) could (should) fit in the gap and align their sequences to the proposed gap-closing sequence. In addition, we did some gap-closing PCRs before and check with their sequence. Finally, we are considering checking a bunch of them with new PCRs.

ADD REPLYlink written 8.7 years ago by lexnederbragt1.2k
gravatar for Daniel Swan
8.7 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

There's also an approach for generating gap spanning contigs by aligning sequences at the contig ends and performing local assemblies.

"Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data."

ADD COMMENTlink written 8.7 years ago by Daniel Swan13k

Just been looking at IMAGE and I think it's specifically focused towards closing gaps in Illumina sequencing data.

ADD REPLYlink written 8.7 years ago by Michael Barton1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour