I evaluated the quality of RNAseq data by fastqc and found that quality of sequences were not so good for following analysis. BUT, there are no over-represented sequences in quality report. As those data has been parsed by others before, I was told to remove the adapter sequences and low-quality sequences first, and then do quality evaluation. I was wondering whether adapter clipper will make fastqc report better in case that no over-represented sequences were detected by fastqc. In other words, the over-represented sequences detected by fastqc are adapters ?
Here is my fastqc command line
fastqc -o ST_read1_fastqc --contaminants TruSeq2-PE.txt -noextract ST_read1.fastq
There can be bunch of other sequences that can be over-represented other than adapters. But if you didn't find any over-represented sequences this means that adapters have already been trimmed off.
SO adaptors are a kind of over-represented sequences and should be able to be detected by fastqc if presented in seq data? The interesting thing is that I run the fastx_clipper on each adaptor in TruSeq2-PE.txt. There are plenty of sequences discarded because they are too short after trimming. The inconsistency between the fastqc and fastx toolkit makes me very confused the concept of over-represented sequences.