I am working on the reassembly of a metagenome. The steps I followed are the following:
- Trim low quality reads
- Remove contamination
- Assembly (spades)
- Binning (maxbin2)
Now I am going for a reassembly of the bins. For each bin I extract the reads that were used to make the contigs using BamM, then I rerun the assembly for each bin using as query the reads that BamM selected. Sadly, Spades doesn't seem to be able to reassemble the reads into contigs as the assembly fails. The error seems to indicate that the coverage is not sufficient.
To see what's going on I took a random bin: assembly size 3.6 Mb, the headers of the contigs in the bin indicate a coverage between 7 and 8. I expect BamM to extract around 3.5*7=23Mb*2=46Mb of reads (multiply by 2 since the original reads are in fastq format). However, in the output folder from BamM I can see only 5Mb between paired end and single reads.
Can somebody suggest a tool to trace back the reads that produced my contigs? preferably one that can return also the mate read when only one is mapping.