I am trying to do some analysis on the mitochondrial genome, specifically examining variants. The experiment was 10X WGS and the sequencing was Illumina based. As part of my analysis, I am taking a mitochondrial alignment file and generating a mpileup using samtools. All of the code runs without error however, I notice that I am getting very high base quality scores and I cannot find evidence that these base quality scores exist in the bam, for example:
In my pileup, I notice the character "o" which by ASCII conversion using perl -E 'say ord("o")-33'
would be a Base-Quality == 78. A shortened-example of the reported base quality scores are:
"F2JJFF7JJoFCJFA7A<f=ffjaafaaff<djafa<a8"< p="">
When I examine the bam used to generate this pileup I do not observe any instance of the 'o' character.
The code to examine the bam is:
samtools view SM_chrM_test.bam | cut -f 11 | egrep 'o' | wc -l
the result is 0.
Any suggestions as to why I am observing this difference would be appreciated.