Hello everyone !
I am currently de novo assembling a drosophila genome, with a high level of polymorphism, using long PacBio reads. As my sequencing coverage is enough (220x), I chose to use the "PacBio" only method, which consist in "polishing" the final assembly by aligning raw reads against it and correct base by base. The algorithm performing this step is called quiver.
After polishing my assembly, I decided to align RNAseq reads against it and to check what was going on. I realised that in some particular region, some indels remains and refuse to disappear. I have tried to increase the polishing coverage to deal with these indels, but it didn’t worked. I don't have enough coverage with Illumina reads, so I can't use Pilon to correct theses regions using short reads.
Also, manually by looking at it in IGV, I couldn’t find any others example that the particular region I am talking about.
Does anyone ever experienced that problem ? Any idea about how to calculate indel rate without a reference genome ? How can I find some others regions with high indel rate ?
I hope that I was clear, don't hesitate to ask me any question to clarify myself,