I am not getting the best mapping rate (~60%) on my latest batch of sequence from a HiSeq run following de novo assembly. I don't see a lot of evidence for DNA contamination in the reads so I've been looking elsewhere for a reason for the low mapping efficiency.
My sequences appear to have high quality scores throughout except for the final base (sub-30 phred). I was able to get these removed in a subset of the data using trimmomatic. One thing persists, however. FastQC reports for "per base sequence content) indicates the last base percentages diverge substantially from the percentages present elsewhere in the reads. For example, my average G% and C% appear at a steady ~22% each throughout the reads but, for the final base, the G read increases to ~25%. the C% to almost 30%.
This observation differs from that I've seen of "normal" RNAseq reads. Have you seen this and/or can you explain the significance of this divergence? Thanks.