Hi, has anyone assembled just a plasmid with Illumina data? What is the best way? My go-to way for assembly has been VelvetOptimiser (for like everything), but it is estimating a very small coverage compared to what it should be in this case, and so it is giving a very large discontiguous assembly.
Extra info: Data are from a MiSeq, a 2x150bp PE run, with the following read metrics. The expected size is 160kb (646x unfiltered coverage).
avgReadLength totalBases maxReadLength minReadLength avgQuality numReads
142.60 103482648 151 35 34.85 725682
Fastqc reports 62% duplication levels unfortunately, but it still leaves a large coverage.
Lee, I think your coverage is way too high. I would subsample down to about 60x, and then do a series of assemblies with increasing coverage. The great thing about Illumina is you usually get excessive coverage so you have the luxury of sampling for the best coverage cutoff, and I can't say where to start that sampling because I'm no expert on plasmids but you've got plenty of data to experiment. It's usually a win-win because you'll get a more contiguous assembly and the assembly process itself will take a fraction of the time with, in this case, 1/10 of the reads. EDIT: I didn't realize you had it down to 4 contigs. Anyway, I would still do multiple assemblies and try to stitch them together with minimus2 or using a reference as a guide, if appropriate.
Absolutely right! I thought that Velvet could handle the high coverage if it were a plasmid, but I was wrong. Rookie mistake. I am still a little surprised that I could only get it down to 3 contigs so far though (it was 81x coverage that got me the best results so far).
What is a good follow up? IMAGE?
Depending on if you have a reference genome you may want to try the PAGIT pipeline, which includes IMAGE and ABACUS. This approach is a bit complex because there are so many dependencies. A simplistic approach would be to just use minimus2 with multiple assemblies (from different coverage cutoffs or different k-mer lengths).
Subsample to 100x and use SPAdes multi-kmer approach as such: