I'm working on improving the gene coverage for the chromosome of an E. coli strain, using existing WGS data. There are two WGS projects for this strain on NCBI (according to the assembly report, the assembly level for one of them is "scaffold", and for the other is "contig").
So far, I've obtained the contigs from both of these projects from NCBI (which I saved as two separate multifasta files), found a decent reference genome, and re-ordered the contig sets against it using Mauve. I then ran the re-ordered contigs through GLIMMER-3, which produced annotated Genbank format files for each WGS project contig set. Basically, I've been following the steps in this tutorial: http://www.microbialinformaticsj.com/content/3/1/2.
After plotting the two sets of annotated contigs against each other using DNA plotter, it looks like together they can give decent gene coverage. There are gaps in the CDS in each case which are complemented by the other.
I have been using contigs from both WGS projects so far, but I think I'm going to start using the scaffolds from the scaffold level assembly; as I've just found out that the scaffold level project includes two plasmids (the contig level WGS project is just a chromosome), and I know which scaffolds are the plasmids, so I can exclude them.
How can I go about assembling a single chromosome using data from both WGS projects? Is this possible? If not, which programmes can highlight the differences between the assembled chromosomes?
Should I be using the scaffolds instead of the contigs for the WGS project which is at a scaffold level?
Are there any programmes for closing gaps between/in scaffolds using additional contigs?
Thanks for your time,