Fastqc 'Chip-Seq' Quality Score Reverse Pattern [Quality Increasing At Ends]
2
1
Entering edit mode
12.3 years ago

Hola! I am encountering a strange problem. My fastqc graph is like this, with quality score increasing at the end, but we should observe is a decrease at the end. alt text

The fastq files are generated using CASAVA-1.8.0 , so the format is supposed to be sanger encoded.

My previous graphs from different experiment they show a decrease in the end as opposed to this one.

Why I am observing this pattern (increase in quality scores at the end)?

Thanks for your comments.

chip-seq fastqc • 4.5k views
ADD COMMENT
0
Entering edit mode

What is your question?

ADD REPLY
0
Entering edit mode

Why I am observing this pattern (increase in quality scores at the end)?

ADD REPLY
0
Entering edit mode

Ever since we've switched to running our samples on a HiSeq machine[*], all of our phred distros exhibit this exact same pattern, and I'd have to say: judging by this quality distro plot alone, your data actually looks pretty great.

[*]I'm not sure if it was the switch to the HiSeq, or the upgraded software/chemistry -- maybe GAIIx runs look like this now, too ... I wouldn't know, though.

ADD REPLY
0
Entering edit mode

The modeling of error rates changed in recent versions of the Illumina software. See this question for more details: http://biostar.stackexchange.com/questions/12150/rna-seq-difference-in-read-quality-pattern-between-illumina-ga-and-hiseq-2000/12179#12179

ADD REPLY
0
Entering edit mode

@Steve @Brad Thanks for your comments, I think the graph is fine, its just the change in error model by Illumina

ADD REPLY
1
Entering edit mode
12.3 years ago

I wouldn't read too much into it. As you can see your error bars are actually increasing. Remember that these quality measures are unreliable approximations and should not be taken overly seriously.

There are only two values really - good (keep) and bad (reject), with a region in between that is a tossup.

ADD COMMENT
0
Entering edit mode

Thanks Istvan, makes sense. Either q>25 or q<10 Cheers

ADD REPLY
1
Entering edit mode
12.3 years ago

As pointed out correctly by Brad, it is due to the change from five-parameter quality model to six-parameter quality model.

From the tech note

"Why did we move to the 6-predictor model? Although the 5-predictor model was very good at predicting quality, the 6-predictor model is more accurate and enables us to accurately predict the high percentage of Q40 data that was missed with the 5-predictor model. The new model is also faster and provides Quality scores after around cycle 11 in read 2 of paired-end reads (compared to around cycle 25 with the previous model)."cycle 25 with the previous model)."

Read it here http://dl.dropbox.com/u/6634542/RTA_Quality_Predictors_TechNote.pdf

So this graph is correct and makes sense now.

Sukhi

ADD COMMENT

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6