Question: Mapping large contigs to a reference genome
Hey everyone,

I'm working on combining contigs from two WGS projects into one scaffold set, Assembling a chromosome for an E. coli strain using contigs/scaffolds from two WGS projects. I've just gotten CISA- which was recommended in a related post- to work for me in the MyPro virtual machine detailed on the CISA page.

Essentially, the CISA output genome size is half of what I specified. I've run it twice using different sets of contigs. It's worked both times, but in each case the total size of the final scaffold set was half of what I specified.

I’ve gotten similar results using Roche and Velvet assemblers. It looks like the contigs I’m using are too large for these programmes.

In each case, I don’t have raw reads or quality values, only sequences from NCBI (the assembly statistics reports are also available on NCBI, but I haven't used them).

Which programme should be used for mapping sets of large contigs to a reference genome?

If there isn’t an appropriate program for this, how can I manually fill the gaps in each sequence with contigs which are unique to the other sequence?

Thanks for your time,


bwa-mem can help. Find out contigs aligning to unique regions, use them to fill in the gaps

