Question

Rna-Seq: Difference In Read Quality Pattern Between Illumina Ga And Hiseq 2000?

2

Entering edit mode

12.6 years ago

Bio_X2Y ★ 4.4k

In the past, we've used an Illumina GA for our RNA-seq experiments. In general, we noticed that the reported quality of the read bases was highest at the 5' end of each read, and the quality dropped gradually towards the 3' end (as per the FASTQ files). This is what we expected.

Recently, however, we've received an RNA-seq dataset generated from a HiSeq 2000, and notice a different pattern. The 5' bases have a high quality, but the quality actually improves in the 3' direction until about base 20 (out of 90), and then drops gradually.

Can someone perhaps comment on whether this alternative pattern is just a harmless artifact of the HiSeq 2000, or if it should be a cause for concern?

Thanks.

rna hiseq illumina quality • 3.9k views

ADD COMMENT • link updated 12.6 years ago by Brad Chapman 9.7k • written 12.6 years ago by Bio_X2Y ★ 4.4k

1

Entering edit mode

Just wanted to add that we've also seen the same pattern -- something like a upside-down-smile (aka. a frown), where something like bases 1-4, 5-9, 10-14 increase in a step-like fashion, then a "normal" phred like distro is seen where we have a gradual/slight decrease in scores towards the 3' direction. We're doing 50 bp runs, and the median score out at base 50 is still ~ 36 (out of 40), so ... all in all, it's still quite good for us.

ADD REPLY • link 12.6 years ago by Steve Lianoglou 5.2k

1

Entering edit mode

@steve: We also see similar pattern; 1-3, 4-8, 9-10, increase stepwise, then gradual increase upto 50-60bp and then slowly decreases till 3' end. we are running 104bp. but over all read qualities are good (median scores >32).

ADD REPLY • link 12.6 years ago by Rm 8.3k

score 7 · Answer 1 · 2011-09-15

Illumina changed the quality prediction in HCS 1.4 (RTA 1.12) to better model error rates at the 5' ends of the sequence. This tech note describes the change (I couldn't find it on the Illumina website, so the link is to my Dropbox):

http://dl.dropbox.com/u/6634542/RTA_Quality_Predictors_TechNote.pdf

Page 11 of the RTA Theory of Operations tech note has additional useful details:

http://www.illumina.com/Documents/products/technotes/technote_rta_theory_operations.pdf

So the new software is attempting to better model the underlying error rates, as opposed to a fundamental change in 5' sequence quality on the Hi-Seq.