I'm still stuck with my problem about remainings indel in my assembly (see this post for further informations : C: [PacBio assembly] Remaining indels after polishing )
I'm trying to understand my problem. To do so, instead of aligning RNA-seq reads (Illumina), I've tried to align DNA reads (still illumina, paired) against my PacBio assembly, to have more data more evenly distributed. With two bam file (RNA seq and DNA reads alignement VS PacBio assembly) and my genome, I was checking some regions using IGV.
Indels are unevenly distributed along the genome, they seem to be clustered in some very particular regions, sometimes in introns, sometimes in exons, but they are most likely to appear in very polymorphic regions. Interestingly, theses regions (high indels, high polymorphism) show small coverage drops of Illumina reads. I don't really know how to interpret theses drops and I need some enlightenments. My guess is that, as the sequence from my assembly is very different, the aligner (hisat2 in my case) couldn't align a lot of reads, and the coverage decrease. Not really sure about my conclusions though.
What do you think about theses coverage drops ? Could it be assembly errors ?
Don't hesitate to ask me further details,