I'm working with RNAseq data obtained with a total RNA protocol. It is paired end, and 75 bases long. There is a control and a disease group.
The disease group shows a very consistent (and biased) pattern across samples in terms of GC content and duplication levels (sequences having up to >10,000 K copies). (See images below)
Has anyone seen something similar and found a way on how to explain the issue?