To check for ASE, I performed multisample gDNA variant calling using GATK and got heterozygous biallelic sites for which I looked for RNA-seq read counts for both Ref and Alt alleles at those sites. I used allelecounter (v0.6) for that (https://github.com/secastel/allelecounter). More information- I performed joint variant calling using 81 water buffalo samples (GATK v18.104.22.168), after which I separated 4 samples from the multisample VCF whose RNA-seq data I had got. BAM files for 14 RNA-seq samples were generated using HISAT2 after which allelecounter was used to count allele-specific read counts. The 14 RNA-seq samples are a mix of total RNA-seq and mRNA-seq. The problematic samples are from total RNA-seq. There are other total RNA-seq samples as well in the set, but they don't show any problem.
For checking reference mapping bias, I generated plots for 14 RNA-seq samples. Two of the plots had multimodal distribution showing extreme imprinted loci which cannot be true.
What can be the reason for this and why is it showing only in the two samples? Just to add, the two sample produced such plot because there are many loci that were genotyped as a heterozygous site, but in the RNA-seq data, those sites have reads supporting either ref or alt allele only. I am aware that some sites will be like that (tail of the distribution), but so many?? If more information is needed, I will be glad to share.