After running FastQC on a samples a got an error in the module 'Overrepresented sequences'. And the overrepresented sequence is an adapter: TruSeq Adapter, Index 15 (97% over 40bp).
a) If the sample has too many TruSeq Adapter, Index 15 sequences why FastQC does not through an error in the module 'Adapter content'?
b) In the 'overrepresented sequences' module there are several rows (each with a sequence) saying possible source TruSeq Adapter, Index 15 (97% over 40bp). Some of the sequences in those rows are slightly different from each other. If they are the same Adapter, how can they be different?
c) Should I trimm all of those slightly different sequences or just one of them?
See: C: fastqc adapter content
Check the rest of the thread for other useful information.
Thank you. So, from reading the thread you've shown we can conclude:
a) when FastQC find overrepresented adapters sometimes it gives an error in 'Overrepresented sequences' other times in 'Adapter content' due to no particular reason.
b) the slightly different sequences are produced when we have short inserts, and it starts to read into adapters once the sequence insert runs out.
c) I should trim all sequences
Is that it?
Failing a test (or more) in FastQC is not a block that prevents you from going further. Every dataset has unique characteristics and depending on the type of experiment "acceptable" values for these tests can change (and you are able to adjust those limits in FastQC config).
@Devon recommends that no trimming is needed if you are planning to use STAR for alignment based on his experience. You can decide if you are comfortable with not trimming the data for re-sequencing applications.
You should scan/trim your data, if you are planning to do any
de novo
work. Quality based trimming in only needed in that case and that too if your data has significant amounts of bad (< Q20) sequence.