Assemble mix of two plasmids which have partially similar sequence
1
0
Entering edit mode
8.6 years ago
mschmid ▴ 180

I have to assemble Illumina sequencing data. I have PE and MP (2x300). Looot of coverage, I have even to downsample and/or normalize the coverage. I am using spades 3.6.0

I first split up the MP data usind NxTrim (worked like a charm, exept of the compilation part). Then for assemly I ONLY use the MP fraction. Seems to work the best this way.

This is the spades call:

spades.py --hqmp1-12 mp.fastq.gz --threads 50 -cutoff auto --careful -o /home/output

The thing is that the sample contains sequences from two E.Coli plasmids. I was able to assemble one Plasmid I think. I basically assembled the plasmid with the MP data and then I mapped the PE data I have to the contig. Seems to confirm that the assembly is good. There is only one region where there is some ambiguity. But I guess this is the case, because the other plasmid I expect to be in the sample has a region which is pretty close to this one from first plasmid. So I guess I see mixed mapping from both plasmids there.

I guess it makes the job harder for spades to assemble both plasmids when they have some areas where they are quite similar.

The thing is that it looks like the second plasmid is also almost closed, but something like 10% of the final sequence seem to be missing.

What would you do?

EDIT -------

What I forgot to mention: I have almost pure plasmid DNA, since the genomic DNA was filtered out in the lab before. After assembly I get barely contigs which map to genome.

illumina Assembly spades plasmids • 2.0k views
ADD COMMENT
2
Entering edit mode
8.6 years ago

I would try mapping the reads to 3 references at once (the e.coli main genome and both of your plasmid assemblies), using e.g. BBSplit with the ambig2=all flag, and also capturing the unmapped reads. Then, combine the unmapped reads with the reads that mapped best to the unfinished plasmid, and assemble those. You may again need to subsample or normalize that data, and do some kind of quality filtering or kmer-depth filtering (normalize while tossing out low-coverage reads), because the unmapped stuff will probably contain a lot of junk in addition to missing plasmid reads.

For example, if your plasmid has relatively even coverage with an average of 1000x:

bbnorm.sh in=reads.fq out=highpass.fq min=200 target=9999999 passes=1

That will simply throw out the low-coverage junk so it won't cause problems in assembly.

ADD COMMENT
0
Entering edit mode

Thanks for your hint! After having both plasmid expected "99% right" I use BBSplit to split up the data and do a second iteration of assembly. Hope that helps.

ADD REPLY

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6