Illumina's HiSeq strange Phred quality score
2
0
Entering edit mode
5.6 years ago
abascalfederico ★ 1.2k

Hi,

I have some HiSeq sequencing data with unusual Phred quality scores. The minimum is "!" and the maximum is "K" (0-42). This is not similar to any of the usual schemes: https://en.wikipedia.org/wiki/FASTQ_format

Since I cannot run a given program with this score scheme I guess I have to rescale the current scores to a standard scheme (e.g. Phred+33). Any hint how can I do this? Would it be ok to just replace "Ks" by "Js"?

Thanks!

phred illumina hiseq • 3.3k views
3
Entering edit mode
5.6 years ago
Dan D 7.2k

You likely have data generated on a HiSeq 3000/4000 or X sequencer. K is the highest quality score on the X platform. Otherwise the ASCII offset will be the same as the prior generation of Illumina sequencer output.

3
Entering edit mode
5.6 years ago
DVA ▴ 610

K is not illegal. You can refer to this post: Illumina X ten samples have phred scores out of range [0,41]

You can also find the phred score calculation here: http://drive5.com/usearch/manual/quality_score.html

I don't think you need to rescale your score, unless your program wants an older system. What is the program you are concerned about?

I won't recommend replacing K with J. Downsteam software (e.g. GATK) in sequencing data analysis might need an accurate score to obtain a best performance.