Question: Assemble mix of two plasmids which have partially similar sequence
gravatar for mschmid
5.4 years ago by
mschmid170 wrote:

I have to assemble Illumina sequencing data. I have PE and MP (2x300). Looot of coverage, I have even to downsample and/or normalize the coverage. I am using spades 3.6.0

I first split up the MP data usind NxTrim (worked like a charm, exept of the compilation part). Then for assemly I ONLY use the MP fraction. Seems to work the best this way.

This is the spades call: --hqmp1-12 mp.fastq.gz --threads 50 -cutoff auto --careful -o /home/output

The thing is that the sample contains sequences from two E.Coli plasmids. I was able to assemble one Plasmid I think. I basically assembled the plasmid with the MP data and then I mapped the PE data I have to the contig. Seems to confirm that the assembly is good. There is only one region where there is some ambiguity. But I guess this is the case, because the other plasmid I expect to be in the sample has a region which is pretty close to this one from first plasmid. So I guess I see mixed mapping from both plasmids there.

I guess it makes the job harder for spades to assemble both plasmids when they have some areas where they are quite similar.

The thing is that it looks like the second plasmid is also almost closed, but something like 10% of the final sequence seem to be missing.

What would you do?

EDIT -------

What I forgot to mention: I have almost pure plasmid DNA, since the genomic DNA was filtered out in the lab before. After assembly I get barely contigs which map to genome.

assembly plasmids spades illumina • 1.4k views
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by mschmid170
gravatar for Brian Bushnell
5.4 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

I would try mapping the reads to 3 references at once (the e.coli main genome and both of your plasmid assemblies), using e.g. BBSplit with the "ambig2=all" flag, and also capturing the unmapped reads.  Then, combine the unmapped reads with the reads that mapped best to the unfinished plasmid, and assemble those.  You may again need to subsample or normalize that data, and do some kind of quality filtering or kmer-depth filtering (normalize while tossing out low-coverage reads), because the unmapped stuff will probably contain a lot of junk in addition to missing plasmid reads.

For example, if your plasmid has relatively even coverage with an average of 1000x: in=reads.fq out=highpass.fq min=200 target=9999999 passes=1

That will simply throw out the low-coverage junk so it won't cause problems in assembly.


ADD COMMENTlink written 5.4 years ago by Brian Bushnell17k

Thanks for your hint! After having both plasmid expected "99% right" I use BBSplit to split up the data and do a second iteration of assembly. Hope that helps.

ADD REPLYlink written 5.4 years ago by mschmid170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2276 users visited in the last hour