I recently obtained a FASTQ file containing biological sequence data, and upon inspection, I noticed that all quality scores across the entire sequence and for every read are uniform. Is this acceptable, or does it indicate a problem with the data?
I give an example of one read but every single has the same quality score:
@Sequence_ID_read_No2
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
????????????????????????????????????????????????????????????????????
In my FASTQ file, each quality score is represented by a question mark ('?')
Could someone please clarify whether uniform quality scores in a FASTQ file are acceptable? Under what circumstances might this occur, and what implications does it have for downstream analysis?
Any insights or guidance would be greatly appreciated. Thank you!
Which technology is that data from? In theory it is possible to have the same score. Whether it happens to be by chance or by design (e.g. fake scores) may need to be checked. Illumina does score binning anyway. https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/technote_understanding_quality_scores.pdf
I had illumina instrument platform for RNA-Seq Analysis
It is not bad to have uniform scores in general, but they are never that uniform as in your example. An educated guess, based on those scores and sequence read names, is that these are simulated scores of some kind. Maybe the sequences were excessively trimmed until all the scores were of the same quality.