How Can A Base-Called Position Be "Unknown" But Have A Non-Minimal Score?
3
2
Entering edit mode
14.2 years ago

Let's say I extract something like this from a qseq or FASTQ file

TTCAGATGTTCATATGCGGATCGGCGCTGGGCCCACGAGATCTAGCAGAGCTCGT.GGGACCACGACCACCGACCC
a`bbbbbbaabbab`ab^`bVa^^bab^[``bba^`]_Ya^`_`^^_Xa\_KYTYD[PY^Y_^[P[V_BBBBBBBB

So the dot is like an N - it can't call the base. So if I look at the FASTQ scores in integer format I would expect that position to have a minimal score. But in fact its score is 'D' or 4, not great but some other called bases at the tail end are 'B' or 2. What gives?

fastq • 2.7k views
ADD COMMENT
0
Entering edit mode

Which platform is this data from?

ADD REPLY
0
Entering edit mode

illumina - maybe v1.3

ADD REPLY
3
Entering edit mode
14.2 years ago

I recall a situation where one of our students (Gue-Su Chang) tracked down a few undocumented behaviors in the Illumina pipeline. Basically once the basecalling is done there are additional filters applied during post-processing to handle a few odd cases. This might be on of those, the score D refers to the original call, but later that gets overridden by another step. I know that's pretty vague. Long story short: I think the score does not apply here.

ADD COMMENT
0
Entering edit mode

I'll get on board with this answer. It's pretty easy to envision a scenario where the actual base call and the quality score get adjusted by separate processes at some point.

ADD REPLY
1
Entering edit mode
14.0 years ago
Peter 6.0k

See this thread: http://seqanswers.com/forums/showthread.php?t=4721

It seems that Illumina can give scores as high as 15 for an N (which may be a bug), and that in their latest pipeline Q2 ("B") is a special marker used at the end of a read with Q4 ("D") the lowest real score (they no longer use Q0, Q1 or Q3).

ADD COMMENT
0
Entering edit mode
13.6 years ago
Ketil 4.1k

I agree this doesn't make much sense. But note also that with standard phred scores, a quality score of about 3 gives a probability of error of about 50%, i.e. it is more likely that the called base is wrong than that it is right.

In almost all cases, I think you should ignore base calls with very low quality.

ADD COMMENT

Login before adding your answer.

Traffic: 1458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6