i would like to ask some comments and suggestions concerning the interpretation of some initial quality control results of fastq files. In detail, in a current RNA-Seq project for gene expression data, based on 2 different cancer cell lines, mRNA sequencing was performed for 12 samples (HiSeq4000, 2 * 75 bases) and very briefly:
a) purification of PolyA containing mRNA molecules using poly-T oligo attached magnetic beads from 1µg total RNA
b) a fragmentation using divalent cations under elevated temperature to obtain approximately 300bp pieces
c) double strand cDNA synthesis
d) Illumina adapters ligation and cDNA library amplification by PCR for sequencing.
So, based on some initial exploratory quality control results based on FastQC, the plethora of samples have failed/got a warning in the section of the "Per Base Sequence Content". This is evident in the relative figures i have also attached, for one sample (which is similar in the others), in which no other quality issues were evident, as also the RIN numbers were fine:
Thus, from a small interpretation of the plots, one could argue that these failures/warnings for each sample, are indicative of putatively overepresented sequences, which might have contaminated the library construction ?
Moreover, is there a way to adjust for this, or still downstream analysis is possible ?
Any suggestions or ideas would be grateful !!