how shoud the phred score be intepreted?
1
0
Entering edit mode
4 months ago
QX ▴ 80

Hi all,

I am looking at the phred-score in the sequencing data, where I try to look at the correlation between phred-score of each base to the other to remove the low quality base.

Does the phred-score is affected by the position of the base in all the reads (vertical), or all the bases that belong to a single read (horizontal)?

enter image description here

if it is vertically correlation, is that make any sense if I choose only part of the reads, for e.g position 20 -> 40, to control the phred-score as other position have low quality phred-score?

sequencing • 762 views
ADD COMMENT
1
Entering edit mode
4 months ago
GenoMax 154k

Does the phred-score is affected by the position of the base in all the reads (vertical), or all the bases that belong to a single read (horizontal)?

Not by position per se. But there can be a couple of reasons. It could be indicative of an issue with that particular sequencing cycle. This can happen if a small bubble goes through the sequencing lane disturbing sequencing/imaging process. Other can be low nucleotide diversity (which you appear to have) and that is generally detrimental to Q-scores.

If you are aligning to a good reference you can likely use data down to Q10 or Q15.

ADD COMMENT
0
Entering edit mode

what do you mean by 'low nucleotide diversity'. If it is low nucleotide diversity, is it supposed to have consistent signals, leading to high and trustable phred scores?

ADD REPLY
1
Entering edit mode

Illumina sequencing assumes/generally expects that clusters in a sequencing field have an even distribution of ACTG so for every sequencing cycle not every cluster shows fluorescence. Basecalling/spot registration software can get confused (if every cluster/spot fluoresces) in case the sequenced base in a cycle is the same for every cluster (which can happen if you are sequencing amplicons). Remember that these clusters are microns apart from each other. This can lead to lowering of Q scoes for basecalls, if low nucleotide diversity is present.

More here --> https://support-docs.illumina.com/SHARE/ClusterOptimize/Content/SHARE/ClusterOptimize/NucleotideDiversity.htm and https://emea.support.illumina.com/bulletins/2016/07/what-is-nucleotide-diversity-and-why-is-it-important.html/1000

ADD REPLY
0
Entering edit mode

thank you for explanation and documentation

ADD REPLY

Login before adding your answer.

Traffic: 5829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6