Hi all ~~
I'm qualifying my reads for further RNA-seq analysis.
When I did fastqc for the first time , I got failure in
- Per base sequence content
- Per sequence GC content
- Sequence duplication levels
- Overrepresented sequences
- Kmer content
Then I checked my overrepresented sequences in blast and those overrepresented sequences were for chloroplast genome. So I eliminated all chloroplast genome from my data.
Then I did fastqc again and I got failure in all I mentioned above except Overrepresented sequences. Still I had 4 failures.
After I read through websites I understood failure in Sequence duplications levels is Ok because they might be due to highly expressed transcripts. So I felt happy for Sequence duplication levels.
On the other hand, the failure in Per base sequence content was due to first 13 bases. So I trimmed them and this module also healed.
But still I have failure in Per sequence GC content and Kmer content. Kmer content is even worse than before and the peaks are all over the positions while at the beginning peaks were around first 12 positions (which I trimmed them).
To make myself sure of lack of adapter sequences (The adapter content in Fastqc is completely alright) I put all adapter sequences used by Illumina in a file and tried to trimmed them off but as I knew there were no adapters and nothing changed in quality.
So do you have any suggestion for having more qualified reads? Did I do sth wrong? Why Kmer became worse?