I am doing de novo assemblies of two bacteria. The sequences were obtained from pacbio. I am using Canu 1.7 & 1.8 in two different clusters. For one cluster (an Intel cluster) I am using Canu 1.8. For the other cluster (IBM) I am using Canu 1.7. The estimated size of the genomes are known.
I have some issues with the assemblies. It seems that there may be some DNA from undesired bacteria in the pacbio output (perhaps contamination of the bacterial cultures).
I was advised to use BWA-mem (https://github.com/lh3/bwa) to map the reads to the known contaminant (I got the contaminant bacteria sequence from GenBank). After mapping, I should discard the reads that map to the contaminant and use the un-mapped reads for the assembly of my desired bacteria. This sounds good for one bacteria.
For the other one, I have got only unplaced-contings for the contaminant (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Moellerella_wisconsensis/latest_assembly_versions/GCF_001020485.1_ASM102048v1/GCF_001020485.1_ASM102048v1_assembly_report.txt) from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Moellerella_wisconsensis/latest_assembly_versions/GCF_001020485.1_ASM102048v1
How to map my reads to such a set of contigs (108 unplaced ones)? Could you guide me to the tool to use for making those 108 unplaced contigs usable for mapping the contaminant reads to them?
If you could guide me to the right tools for performing the removal of the contaminant would be great. For example, is BWA-mem the best tool for the job? I have seen in the internet a tool from QIAgen, but is is a paid software https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/.