Assembling a chromosome for an E. coli strain using contigs/scaffolds from two WGS projects
Entering edit mode
6.9 years ago
rnnh ▴ 30

Hello everyone,

I'm working on improving the gene coverage for the chromosome of an E. coli strain, using existing WGS data. There are two WGS projects for this strain on NCBI (according to the assembly report, the assembly level for one of them is "scaffold", and for the other is "contig").

So far, I've obtained the contigs from both of these projects from NCBI (which I saved as two separate multifasta files), found a decent reference genome, and re-ordered the contig sets against it using Mauve. I then ran the re-ordered contigs through GLIMMER-3, which produced annotated Genbank format files for each WGS project contig set. Basically, I've been following the steps in this tutorial:

After plotting the two sets of annotated contigs against each other using DNA plotter, it looks like together they can give decent gene coverage. There are gaps in the CDS in each case which are complemented by the other.

I have been using contigs from both WGS projects so far, but I think I'm going to start using the scaffolds from the scaffold level assembly; as I've just found out that the scaffold level project includes two plasmids (the contig level WGS project is just a chromosome), and I know which scaffolds are the plasmids, so I can exclude them.

How can I go about assembling a single chromosome using data from both WGS projects? Is this possible? If not, which programmes can highlight the differences between the assembled chromosomes?

Should I be using the scaffolds instead of the contigs for the WGS project which is at a scaffold level?

Are there any programmes for closing gaps between/in scaffolds using additional contigs?

Thanks for your time,


Assembly alignment genome • 3.4k views
Entering edit mode

@r.harrington747 There mauve tool I only used it once, but it allows you to compare different assemblies.

Here is similar question I think to yours: From Contigs To Chromosome Scale Scaffold

Also to get you contigs and scaffolds maybe try spades raw read assembler I've used that instead of velvet

Entering edit mode
6.9 years ago
h.mon 34k

If the raw sequencing reads were also released, you could try GAM-NGS. Another option is to use MIRA to perform a genome-guided assembly, use the scaffolds as reference, and the two sets of contigs as two additional strains to be assembled. A third option is to use CAP3 to merge the assemblies, but I remember reading (from more than one source, though I do not recall a single one) this is not recommended, as it introduces mis-assemblies.

edit: I saw Plasmid Assembly Use Of Cisa Contig Intergrator post listed as similar to yours, seems like a good alternative to try.


Login before adding your answer.

Traffic: 883 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6