Here is an interesting case and would appreciate feedback:
A 4000 year old human sample from Afghanistan was sequenced using Illumina MiSeq, using the paired end method. Modern human DNA contamination is around 10%, and the sample is subject to post mortem degradation.. The files were uploaded to SRA by Max Plank in Germany.
I fetched the file, SRR3970376 and formed split fastq files, forward and reverse reads. I have attached the output reports from FASTQC::
Here are my questions:
1- Why are the reverse reads substantially lower quality than the forward ones. Flow cell overclustering? Issues with the sequencing machine? Degraded primer for the reverse reads? or something else?
2- It is odd that the phred quality scores on the forward reads are as high as 60 (reverse reads up to 35). I have never seen scores higher than 40.
3- Any other thoughts based on the totality of both outputs?