Advice for high quality scores for GATK
1
0
Entering edit mode
8.9 years ago
kezcleal ▴ 160

I am using data which appears to have a higher than expected distribution of quality scores.

From picard-tools: QualityScoreDistribution.jar, the highest scores are over 70:

http://s483.photobucket.com/user/Kez_Cleal/media/1_DB31.qualDist_zpsf2xjcckq.png.html

When I try and use GATK it complains the scores are too high, using BaseRecalibrator:

##### ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/kez/KC_15-05-20/inputs/DB31/QC_hg19/1_DB31/1_DB31.mkDup.rg.rAln.bam} is malformed: we encountered an extremely high quality score (72) with BAQ correction factor of 8; the BAM file appears to be using the wrong encoding for quality scores

I have no Idea what scores are being used in the file. I have read that --fix_misencoded_quality_scores is ill-advised unless you know what you are doing. How do I find out if I can use this option?

next-gen sequence • 3.6k views
ADD COMMENT
2
Entering edit mode
8.9 years ago

It looks like you have an old dataset with phred+64 quality encoding and didn't tell the aligner that. Given the score distribution, you should be safe with the --fix_misencoded_quality_scores option. Note that next time you'll want to use the proper settings with your aligner (granted, nothing actually produces phred+64 quality scores anymore).

ADD COMMENT
0
Entering edit mode

Thank you. The sequencing of these samples was outsourced to BGI in china although I am unaware what platform was used.

ADD REPLY

Login before adding your answer.

Traffic: 1454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6