Question: Fastq files with very high per base sequencing quality score
0
gravatar for Ivan S
3.3 years ago by
Ivan S0
Ivan S0 wrote:

Hello,

I am currently working with Fastq files of exome sequencing with a coverage of 150x. After running FastQC tool on these files I observe quite high Quality Score values (~35 on average) with very narrow distribution across all positions. This seems a little suspicious to me. Since I have very little experience on this type of data I'd like to ask, Is it normal to observe such high Quality Score results??

Thank you for your help

ADD COMMENTlink modified 4 days ago by Biostar ♦♦ 20 • written 3.3 years ago by Ivan S0
1

That is normal. You can even see that in the example FastQC report: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by igor8.8k

Thanks a lot, I hadn't noticed this same tendency in the example report

ADD REPLYlink written 3.3 years ago by Ivan S0
1

You can analyze the quality scores empirically if you want, via mapping; BBMap has several options for that:

bbmap.sh ref=hg19.fa in=reads.fq.gz mhist=mhist.txt qahist=qahist.txt qhist=qhist.txt

mhist generates a histogram of matches and mismatches by base position; qhist gives claimed and measured quality per position; and qahist gives the quality-score accuracy (claimed versus observed). Sometimes the quality scores are quite accurate, sometimes not; it depends on a lot of factors including luck. But if you suspect they are wrong, it's nice to validate that.

Note that humans, being diploid with a roughly 1/1000 SNP rate, have a noise floor of around 30dB for these testing methods - they work better on haploids. But they will still be fairly accurate up to Q30.

ADD REPLYlink written 3.3 years ago by Brian Bushnell16k

That is not surprising, if the libraries are of good quality (and the read length is not > 150).

ADD REPLYlink written 3.3 years ago by genomax74k

Suspicious data you say? ಠ_ರೃ

Can we see the clues too?

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by John12k

It depends on which sequencing technology you have used. If your data is from Illumina HighSeq, I would say the quality is as expected. But if your data is from Nanopore, I would also think it is suspicious.

ADD REPLYlink written 3.3 years ago by piet1.7k
1
gravatar for Brice Sarver
3.3 years ago by
Brice Sarver3.2k
United States
Brice Sarver3.2k wrote:

Data I've analyzed from current sequencing platforms usually have excellent per-base quality scores. Though not always the case, I see larger 'dips' in quality scores at the beginning and end positions much less frequently than back in the earlier Illumina/454 days. You probably just have good data!

ADD COMMENTlink written 3.3 years ago by Brice Sarver3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1217 users visited in the last hour