I'm new to RNA-Seq and have just run FastQC on my dataset. On the plots of GC content, all of the samples have a peak at around 60%, as shown here: http://i.imgur.com/YReFOV7.png
I've blasted a few of the most overrepresented sequences and each one hits multiple genes of multiple mammalian species with 100% identity. Each one hits the human signal recognition particle RNA (SRP 7SL), but also hits predicted targets in other mammals. Here's an example sequence:
Can anyone suggest what could be causing this? As I say, I'm new to RNA-Seq so it could be some beginners misunderstanding/ignorance. I haven't touched the data in any way (no trimming or any other quality cut-offs) - they are run directly through FastQC. As far as I can tell, the main quality measures (Per base sequence quality, Per sequence quality scores) are good, though several of the others (Per base sequence content, Adapter content, and kmer content) show red flags.
In case it's useful, these were paired end reads generated on Illumina Total RNA TRUSEQ.
Thank-you for any help.