2
1
Entering edit mode
10.3 years ago
Juliofdiaz ▴ 140

Hello: I have been asked to compare the quality of read one and read 2 of a illumina PE sequencing run and I am looking for the best way to represent it. I have come up with a couple of alternative:

• Count the number of bases under q 20 for each read and average it throughout the whole run, This method will lose the significance of the phred score, but it may be good to compare.
• Average the phred score for each base for each read throughout the whole run. This method may lose sensitivity since one phred quality value will represent a bunch of them.

Is there a standard way to do this? Which way is better? Thanks

illumina quality • 3.3k views
0
Entering edit mode
10.3 years ago
Dan D 7.3k

For your specific application, I recommend FastQC. It uses the quality scores, to be sure, but it also goes much deeper than that. It provides thorough feedback about your data quality, but from my experience it's very easy to use if you're comfortable on the command line. Even if you're not comfortable on the command line, you can use the public Galaxy instance at Penn State and run it that way.

0
Entering edit mode
10.3 years ago
Weronika ▴ 300

I usually use FASTX_Toolkit to get the per-base quality box-whisker plot (and a nucleotide distribution plot). I'm not aware of any tool or standard method specifically for comparing qualities between two datasets, but unless your needs are very specific, I'd probably stick with something similar to this standard quality representation.

if just doing the quality plot for each and comparing by eye won't be enough, possibly just plotting some variant of those things on the same graph could work. Running fastx_quality_stats gives you a textfile with the quality mean/median/quartiles/etc for each base, so you could pretty easily do a custom plot based on that, or just do numerical comparisons.