I have some paired end RNAseq data in hands that I am analysing, and as I am very newbie to this, I would appreciate some input in the question "TO TRIM OR NOT TO TRIM?" given my current situation.
I have read so many posts which led me to conclude NOT to trim raw data if it's good quality. But I don't know if mine is, so I am worried with some yellow warnings. My PE RNAseq library prep of human brain tissue was made with TruSeq Illumina kit A using index 5.
So, for example, I find a yellow warning for overrepresented sequences - none are Illumina adapters/index. When I align them, they mostly overlap, and when I blast it this is the result:
Should I include its sequence to be trimmed with Trimmomatic or just leave it? What about the sequence of the index #5? To trim it or not to trim it? No need to trim any illumina adaptor, since my Fastqc says there's none present in my file, right?
For last, the warning that most concerns me is the per sequence GC content.
My aim with these data is to identify any (novel?) transcripts expressed in this brain region, focusing specifically in one chromosomal region.
I thought of using ONLY a trimmomatic SLIDINGWINDOW:4:20, which would cut out low quality bases, and that's all. And maybe cutting also the sequence of the index and the overrepresented sequences.
So.. I really appreciate your attention here already in advance!
Many thanks and my best regards,