Question

Assemble mix of two plasmids which have partially similar sequence

0

Entering edit mode

8.6 years ago

mschmid ▴ 180

I have to assemble Illumina sequencing data. I have PE and MP (2x300). Looot of coverage, I have even to downsample and/or normalize the coverage. I am using spades 3.6.0

I first split up the MP data usind NxTrim (worked like a charm, exept of the compilation part). Then for assemly I ONLY use the MP fraction. Seems to work the best this way.

This is the spades call:

spades.py --hqmp1-12 mp.fastq.gz --threads 50 -cutoff auto --careful -o /home/output

The thing is that the sample contains sequences from two E.Coli plasmids. I was able to assemble one Plasmid I think. I basically assembled the plasmid with the MP data and then I mapped the PE data I have to the contig. Seems to confirm that the assembly is good. There is only one region where there is some ambiguity. But I guess this is the case, because the other plasmid I expect to be in the sample has a region which is pretty close to this one from first plasmid. So I guess I see mixed mapping from both plasmids there.

I guess it makes the job harder for spades to assemble both plasmids when they have some areas where they are quite similar.

The thing is that it looks like the second plasmid is also almost closed, but something like 10% of the final sequence seem to be missing.

What would you do?

EDIT -------

What I forgot to mention: I have almost pure plasmid DNA, since the genomic DNA was filtered out in the lab before. After assembly I get barely contigs which map to genome.

illumina Assembly spades plasmids • 2.0k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by mschmid ▴ 180

Ram · Answer 1 · 2015-09-17

I would try mapping the reads to 3 references at once (the e.coli main genome and both of your plasmid assemblies), using e.g. BBSplit with the ambig2=all flag, and also capturing the unmapped reads. Then, combine the unmapped reads with the reads that mapped best to the unfinished plasmid, and assemble those. You may again need to subsample or normalize that data, and do some kind of quality filtering or kmer-depth filtering (normalize while tossing out low-coverage reads), because the unmapped stuff will probably contain a lot of junk in addition to missing plasmid reads.

For example, if your plasmid has relatively even coverage with an average of 1000x:

bbnorm.sh in=reads.fq out=highpass.fq min=200 target=9999999 passes=1

That will simply throw out the low-coverage junk so it won't cause problems in assembly.