OK. I realize this is not exactly a bioinformatics question but I know that a lot of people in this forum spend their days staring at NGS alignments and am hoping someone has an explanation or some insight.
See the IGV screenshot below of representative matched tumor and normal samples. The pattern shown is VERY characteristic of the problem and consistent throughout the entire genome. The data are from whole genome sequencing (WGS) on the Illumina X10 platform and aligned with BWA-MEM to GRC37. The symptoms are (1) very uneven coverage with valleys and peaks that seem to correlate perfectly with absence or presence of Alu elements respectively; and (2) unusually high discordant read pair rates (5-25%). Note the region to the right with few Alu sequences looks reasonably good although less than the targeted coverage given how many reads are burned up in the problematic spikes around Alu elements. Also, note the IGV session is colored by insert size and pair orientation. The different colors represent reads where the mate pair is aligned to a different chromosome or unexpected fragment size. The alignments of both mates are generally good as if encompassing a real translocation but no consistency in breakpoints and with too many diverse discrepancies for even the most rearranged genome. We have seen this in two projects now with externally provided DNA. Both projects were 50-100 samples where the problem was quite consistent (with minor exceptions) across the sample set. It does not correlate with instrument, lane, or flowcell as other samples (from other projects) that were pooled with these samples had no problems. Repeated sequencing on different instrument/platform and even new library constructions (with different kits - Kapa, SWIFT, etc) produced data with identical characteristics. The only thing that has helped is entirely new DNA preps from source materials. In the first project we went back to original tumor/normal tissues and prepared new DNA preps in our own lab and the problem went away entirely. That is a solution (although expensive to basically repeat all work and sequencing) but I would like to understand the root/underlying cause.
Has anyone seen this? Any insight as to what might be going wrong in sample prep to cause this apparent enrichment in regions of the genome coinciding with Alu elements? Google has failed me so far. Let me know if there is somewhere else that I might try posting this.