Question

Is B-I A Valid Fastq Score Range?

0

Entering edit mode

11.2 years ago

Martin A Hansen 3.0k

I received some data from a third party provider where the FASTQ files have scores encoded in the range from B (ascii 66) to i (ascii 105). This range is not described in the Wikipedia entry on the FASTQ format, so is this range valid?

fastq quality • 3.2k views

ADD COMMENT • link updated 11.2 years ago by Istvan Albert 100k • written 11.2 years ago by Martin A Hansen 3.0k

0

Entering edit mode

This is FASTQ data from what I believe is Illumina sequencing and processing with the Illumina 1.5+ pipeline (that remains to be confirmed).

ADD REPLY • link 11.2 years ago by Martin A Hansen 3.0k

score 2 · Answer 1 · 2013-02-07

EDIT 2: This is not the correct answer (see EDIT below) and it should therefore not have been upvoted. Please see Istvan's answer below.

Actually, this range is valid and is mentioned in the Wikipedia article you cite. This looks like Illumina 1.3-1.7 with an ASCII offset of 64. So B translates to 2 (a special value marking nucleotides that should be ignored) and i to 41 (EDIT 1: sorry, said initially 39. And 41 is actually not expected). Here's the relevant part from the section "Encoding":

Starting with Illumina 1.3 and before Illumina 1.8, the format encoded a Phred quality score from 0 to 62 using ASCII 64 to 126 (although in raw read data Phred scores from 0 to 40 only are expected).

Andreas

score 2 · Answer 2 · 2013-02-07

Capping at the maximal quality value of 40 is a convention that by now most instruments adopted. Technically the Phred quality scores go from 0 to 93. So the use of the quality 'i' does not necessarily indicate a problem.

That being said it is a bit suspicious when you see quality scores that are just out of the usual range. Plus this looks like one of the older quality encodings. But if so you may have other problems, the probability formula was defined slightly differently for some of these encodings thus the values are not directly comparable anyhow. (This is how I recall it)

Peter s paper has more details on the nitty gritty The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants, Nucl. Acids Res. (2010)

score 0 · Answer 3 · 2013-02-07

0

Entering edit mode

11.2 years ago

lelle ▴ 830

As there is no actual standard for FASTQ there is no possibility to say what is a "valid" FASTQ file. It all depends on what the tools you want to use will expect and accept.