I would like to ask you for some suggestions for pilon polishing of a canu assembled genome.
Long story short, we try to assemble a chloroplast genome which is extremely repetitive. We have around 3Gbp of nanopore data, and we are unable to make a single contig from the genome Before we try to manually circularize the assembly we wanted to polish it first. I used nanopolish and then I went for pilon, using illumina reads (2x150bp)
However here is the big issue. It seems like pilon manages to confirm just 76% of the data for the biggest contig in the assembly. As I read, bwa reports just the best mapping location in the genome, so if a read maps to multiple location just a single location is reported. Since the genome seems to have a lot of repetitions, it seems like bwa maps the repetitive reads just to a single location, and the rest of the areas in the genome which have these repetitions are not polished.
How I could manage to polish the entire contig, and make bwa (or some other software) to report all mapping locations? I tried to use bbmap, and set it to report all mapping locations and in this way the coverage increased to 99.8% (based on pileup.sh) However, bbmap is not a recommended mapper for pilon, and the minid for bbmap it is set to 0.76 (default) so I am worried that this type of mapping can also create a lot of issues for pilon.
I noticed this issue in other data from my genomes. Mostly I noticed it on eukaryotic data where the rRNA sequences are in multiple locations, and some of them are not polished completely, having mismatches to the rRNA sequences which were manually amplified and sequenced by sanger sequencing. If there would be some polymorphism in the genome with the rRNA, I would see this in the sanger sequencing, but there none in our data.
Any suggestions how to deal with this?