Question: Best Way To Assemble A Plasmid
gravatar for Lee Katz
7.7 years ago by
Lee Katz3.0k
Atlanta, GA
Lee Katz3.0k wrote:

Hi, has anyone assembled just a plasmid with Illumina data? What is the best way? My go-to way for assembly has been VelvetOptimiser (for like everything), but it is estimating a very small coverage compared to what it should be in this case, and so it is giving a very large discontiguous assembly.

Extra info: Data are from a MiSeq, a 2x150bp PE run, with the following read metrics. The expected size is 160kb (646x unfiltered coverage).

avgReadLength  totalBases  maxReadLength  minReadLength  avgQuality  numReads 
142.60         103482648   151            35             34.85       725682

Fastqc reports 62% duplication levels unfortunately, but it still leaves a large coverage.

illumina miseq assembly • 4.3k views
ADD COMMENTlink modified 4.2 years ago by Biostar ♦♦ 20 • written 7.7 years ago by Lee Katz3.0k
gravatar for Lee Katz
7.7 years ago by
Lee Katz3.0k
Atlanta, GA
Lee Katz3.0k wrote:

I think I made it work, but I am hoping for new ideas. This was what I did:

  1. I cleaned the reads with my custom CG-Pipeline script --unique -i shuffled.fastq.gz --min_avg_quality 35 -o filtered.fastq.gz
  2. Saw with Fastqc that the first ~15 and last ~5 bases still looked a little bad
  3. Trimmed these bases with fastx gunzip -c filtered.fastq.gz | fastx_trimmer -f 15 -l 145 -Q33 | gzip -c > filtered.trimmed.fastq.gz
  4. VelvetOptimiser (with 275x coverage)

I have it down to 4 contigs, plus some hanging ones that may or may not belong to the plasmid. I am hoping to get it to a single contig but I know that it might be difficult.

ADD COMMENTlink written 7.7 years ago by Lee Katz3.0k

Lee, I think your coverage is way too high. I would subsample down to about 60x, and then do a series of assemblies with increasing coverage. The great thing about Illumina is you usually get excessive coverage so you have the luxury of sampling for the best coverage cutoff, and I can't say where to start that sampling because I'm no expert on plasmids but you've got plenty of data to experiment. It's usually a win-win because you'll get a more contiguous assembly and the assembly process itself will take a fraction of the time with, in this case, 1/10 of the reads. EDIT: I didn't realize you had it down to 4 contigs. Anyway, I would still do multiple assemblies and try to stitch them together with minimus2 or using a reference as a guide, if appropriate.

ADD REPLYlink modified 7.7 years ago • written 7.7 years ago by SES8.4k

Absolutely right! I thought that Velvet could handle the high coverage if it were a plasmid, but I was wrong. Rookie mistake. I am still a little surprised that I could only get it down to 3 contigs so far though (it was 81x coverage that got me the best results so far).

What is a good follow up? IMAGE?

ADD REPLYlink written 7.7 years ago by Lee Katz3.0k

Depending on if you have a reference genome you may want to try the PAGIT pipeline, which includes IMAGE and ABACUS. This approach is a bit complex because there are so many dependencies. A simplistic approach would be to just use minimus2 with multiple assemblies (from different coverage cutoffs or different k-mer lengths).

ADD REPLYlink written 7.7 years ago by SES8.4k

Subsample to 100x and use SPAdes multi-kmer approach as such: -k 21,33,55,77 --careful --only-assembler <your reads> -o spades_output
ADD REPLYlink written 5.0 years ago by apelin20470
gravatar for jigarnt
5.0 years ago by
jigarnt30 wrote:

Hi Lee,

I am planning to perform a de Novo assembly of a supposed Illumina sequenced plasmid of size around 3kb. In that case, which is the best way to do it?

ADD COMMENTlink modified 11 months ago by RamRS30k • written 5.0 years ago by jigarnt30

This was a long time ago but I believe I found at the time that VelvetOptimiser on cleaned/filtered reads, followed up with IMAGE2 worked well.

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by Lee Katz3.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1526 users visited in the last hour