I've run FastQC on a sample of Illumina RNA-Seq. It identifies issues with abnormal kmer counts:
There is the as well the kmer table:
What's wrong with this sample? What do I need to do? I reckon this points to a contamination of sorts. I've already run trimmomatic on this sample using the standard Truseq adapters so I am not sure what to make of this.
Here is the sequence content across all bases:
Here is the GC content:
As advised by Amitm I rerun FastQC using the latest version (0.11.4). The previous analysis was done by version 0.10.1. There are interesting differences. I am posting below the graph for the per base sequence content:
In particular, the kmer abnormalities seem to be concentrated at the start of the reads:
This looks not too hopeless to me even though I still have no clue how to deal with it. On a different note, I am slightly worried about the discrepancies between the different FastQC versions. I've always assumed that they would produce, more or less, identical results. I realise now that this is somewhat naive. I dread to think what I would find if I rerun the latest version of FastQC on samples I've analysed in the past.