Entering edit mode
4.3 years ago
ericjanhoekstra
•
0
I just did an FASTQC on a very low quality fastq file. This is a part of the fastq file:
--
GGGATACCAAGAGGTCTTTATTGCCCACCACTCTGCAC
+SRR1106118.1 VHE-8221810561012-4-6-0-0 length=38
//////////////////////////////////////
@SRR1106118.2 VHE-8221810561012-4-6-0-88 length=25
ACAAACAAAGCACACAAAACAACCA
+SRR1106118.2 VHE-8221810561012-4-6-0-88 length=25
/////////////////////////
@SRR1106118.3 VHE-8221810561012-4-6-0-170 length=25
CATTAAACTTGTTTTAATGGTCTCC
+SRR1106118.3 VHE-8221810561012-4-6-0-170 length=25
/////////////////////////
@SRR1106118.4 VHE-8221810561012-4-6-0-275 length=31
AATCAATAAAAAGATAGTTTATTTAAAAGCT
+SRR1106118.4 VHE-8221810561012-4-6-0-275 length=31
///////////////////////////////
As you may be able to see is that all the quality scores equal /. Which means the quality score accros all the reads is 14.
I have 2 questions about this matter:
The first question is: what this means in a biological contest? What could have gone wrong during sequencing or does this have to with a particular DNA sequence in the genome?
My second question is: As you may also see is that the read length is fairly low. It's between 24-57. What does this mean and how does this happen?
This data has likely been scanned and trimmed to remove adapters. It means that you have short inserts. This is fine and expected in some cases e.g. miRNA or smallRNA sequencing.
Edit: Looking at the SRA record for
SRR1106118
, this sequence was produced using helicos technology. It may have something to do with the length and quality here. Length of the reads is only 32.Adding on this, given that this is not Illumina (and even if it was) I would judge data quality on meaningful metrics such as mapping percentage and overlap with annotated exons. As long as mapping is good you do not need to bother with base quality in most cases.