What kind of biological factors can reduce the quality of reads? I for example have an average read quality of 14, what biological reason can be responsible for this?
The example is from the Helicos platform yes, but I'm more looking for a global explanation about the biological factors that can reduce the quality of reads.
Do you have any evidence that with this platform and the way they calculate base scores you can even get notably higher values? Is there evidence that this particular sample is "bad" while others are "good"? You make your life far too difficult with investigating profane things like base quality. Better align data and proceed with the downstream analysis.
I'm a bioinformatics student and we have to make a tool that analyses fastqc output and give a biological reason for why that tool failed for example. The Helicos data is data we got as an example to run fastqc on and analyse the output from. So that's why I'm looking for biological factors that can reduce quality of reads. We're not planning on proceeding to analyse this data.
FASTQC's thresholds for flagging a sample as problematic are tailored for Illumina sequencing (afaik). Therefore I think you are chasing ghosts. Helicos is simply a different platform, you are comparing apples to peers. Beyond that, genomax has given excellent points that might explain poor base quality (even though I think here it does not apply since you have on average low quality not just at a certain position of the read). As said, this is Helicos so that quality might be normal and expected. You may suggest to your supervisor to download a few unrelated Helicos samples and compare average base qualities. Then you know if this 14 is indeed low or rather the default output of that platform. Also, biological typically refers to something happening in a living organism, the term here should be technical.
Short(er) than expected inserts, which leads to adapter read through.
Do you have low diversity sequences (e.g. amplicons)? Having the same base light up in multiple clusters can lead to software having difficulty in recognizing/keeping track of clusters.
It is possible that there was a problem of some sort with the run (hardware/reagents) that can lead to Q score issues
Note: If this is a follow-up on a previous question you had posted then you already have got a couple of comments there: Read quality only 14
Is this again from Helicos platform as in Read quality only 14 ?
The example is from the Helicos platform yes, but I'm more looking for a global explanation about the biological factors that can reduce the quality of reads.
Do you have any evidence that with this platform and the way they calculate base scores you can even get notably higher values? Is there evidence that this particular sample is "bad" while others are "good"? You make your life far too difficult with investigating profane things like base quality. Better align data and proceed with the downstream analysis.
I'm a bioinformatics student and we have to make a tool that analyses fastqc output and give a biological reason for why that tool failed for example. The Helicos data is data we got as an example to run fastqc on and analyse the output from. So that's why I'm looking for biological factors that can reduce quality of reads. We're not planning on proceeding to analyse this data.
FASTQC's thresholds for flagging a sample as problematic are tailored for Illumina sequencing (afaik). Therefore I think you are chasing ghosts. Helicos is simply a different platform, you are comparing apples to peers. Beyond that, genomax has given excellent points that might explain poor base quality (even though I think here it does not apply since you have on average low quality not just at a certain position of the read). As said, this is Helicos so that quality might be normal and expected. You may suggest to your supervisor to download a few unrelated Helicos samples and compare average base qualities. Then you know if this 14 is indeed low or rather the default output of that platform. Also,
biological
typically refers to something happening in a living organism, the term here should betechnical
.