I am analyzing exome sequencing data (~100x coverage). Pipeline includes BWA for alignment and then GATK for variation calling. While BQSR (one fthe steps in variation calling), the empirical base quality score values generated for my data are coming out to be very low. (I am looking at the average of the empirical values given in recal file=19). As a result the re-calibration plots are also not coming as they should be. (attached herewith).
Despite of getting bad empirical sores I went ahead and re-calibrated my data, and after re-calibration the average corrected base quality scores are as low as 16!!! I am not sure this re-calibration is fine or not as the confidence of variation calling will certainly go down with such low base quality scores.
My raw data and quality check otherwise looks fine. Also when I do re-calibration on a 30x coverage data, everything seems fine. I have no clue why empirical scores are going down for a high coverage data!
Can anyone help me in this?
Have you looked at your data in a browser? Are there a lot of mismatches after alignment?