Best Way To Assemble A Plasmid
2
0
Entering edit mode
11.8 years ago
Lee Katz ★ 3.2k

Hi, has anyone assembled just a plasmid with Illumina data? What is the best way? My go-to way for assembly has been VelvetOptimiser (for like everything), but it is estimating a very small coverage compared to what it should be in this case, and so it is giving a very large discontiguous assembly.

Extra info: Data are from a MiSeq, a 2x150bp PE run, with the following read metrics. The expected size is 160kb (646x unfiltered coverage).

avgReadLength  totalBases  maxReadLength  minReadLength  avgQuality  numReads 
142.60         103482648   151            35             34.85       725682

Fastqc reports 62% duplication levels unfortunately, but it still leaves a large coverage.

miseq illumina assembly • 5.6k views
ADD COMMENT
1
Entering edit mode
11.8 years ago
Lee Katz ★ 3.2k

I think I made it work, but I am hoping for new ideas. This was what I did:

  1. I cleaned the reads with my custom CG-Pipeline script run_assembly_trimClean.pl --unique -i shuffled.fastq.gz --min_avg_quality 35 -o filtered.fastq.gz
  2. Saw with Fastqc that the first ~15 and last ~5 bases still looked a little bad
  3. Trimmed these bases with fastx gunzip -c filtered.fastq.gz | fastx_trimmer -f 15 -l 145 -Q33 | gzip -c > filtered.trimmed.fastq.gz
  4. VelvetOptimiser (with 275x coverage)

I have it down to 4 contigs, plus some hanging ones that may or may not belong to the plasmid. I am hoping to get it to a single contig but I know that it might be difficult.

ADD COMMENT
1
Entering edit mode

Lee, I think your coverage is way too high. I would subsample down to about 60x, and then do a series of assemblies with increasing coverage. The great thing about Illumina is you usually get excessive coverage so you have the luxury of sampling for the best coverage cutoff, and I can't say where to start that sampling because I'm no expert on plasmids but you've got plenty of data to experiment. It's usually a win-win because you'll get a more contiguous assembly and the assembly process itself will take a fraction of the time with, in this case, 1/10 of the reads. EDIT: I didn't realize you had it down to 4 contigs. Anyway, I would still do multiple assemblies and try to stitch them together with minimus2 or using a reference as a guide, if appropriate.

ADD REPLY
0
Entering edit mode

Absolutely right! I thought that Velvet could handle the high coverage if it were a plasmid, but I was wrong. Rookie mistake. I am still a little surprised that I could only get it down to 3 contigs so far though (it was 81x coverage that got me the best results so far).

What is a good follow up? IMAGE?

ADD REPLY
0
Entering edit mode

Depending on if you have a reference genome you may want to try the PAGIT pipeline, which includes IMAGE and ABACUS. This approach is a bit complex because there are so many dependencies. A simplistic approach would be to just use minimus2 with multiple assemblies (from different coverage cutoffs or different k-mer lengths).

ADD REPLY
0
Entering edit mode

Subsample to 100x and use SPAdes multi-kmer approach as such:

 spades.py -k 21,33,55,77 --careful --only-assembler <your reads> -o spades_output
ADD REPLY
0
Entering edit mode
9.0 years ago
jigarnt ▴ 30

Hi Lee,

I am planning to perform a de Novo assembly of a supposed Illumina sequenced plasmid of size around 3kb. In that case, which is the best way to do it?

ADD COMMENT
0
Entering edit mode

This was a long time ago but I believe I found at the time that VelvetOptimiser on cleaned/filtered reads, followed up with IMAGE2 worked well.

ADD REPLY

Login before adding your answer.

Traffic: 1595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6