I have to assemble Illumina sequencing data. I have PE and MP (2x300). Looot of coverage, I have even to downsample and/or normalize the coverage. I am using spades 3.6.0
I first split up the MP data usind NxTrim (worked like a charm, exept of the compilation part). Then for assemly I ONLY use the MP fraction. Seems to work the best this way.
This is the spades call:
spades.py --hqmp1-12 mp.fastq.gz --threads 50 -cutoff auto --careful -o /home/output
The thing is that the sample contains sequences from two E.Coli plasmids. I was able to assemble one Plasmid I think. I basically assembled the plasmid with the MP data and then I mapped the PE data I have to the contig. Seems to confirm that the assembly is good. There is only one region where there is some ambiguity. But I guess this is the case, because the other plasmid I expect to be in the sample has a region which is pretty close to this one from first plasmid. So I guess I see mixed mapping from both plasmids there.
I guess it makes the job harder for spades to assemble both plasmids when they have some areas where they are quite similar.
The thing is that it looks like the second plasmid is also almost closed, but something like 10% of the final sequence seem to be missing.
What would you do?
What I forgot to mention: I have almost pure plasmid DNA, since the genomic DNA was filtered out in the lab before. After assembly I get barely contigs which map to genome.