I am downloading public data, and am running FastQC on a number of FASTQ files I've downloaded. I get reports like this:
PASS Basic Statistics SRR2637682_1.fastq.bz2 PASS Per base sequence quality SRR2637682_1.fastq.bz2 PASS Per tile sequence quality SRR2637682_1.fastq.bz2 PASS Per sequence quality scores SRR2637682_1.fastq.bz2 FAIL Per base sequence content SRR2637682_1.fastq.bz2 FAIL Per sequence GC content SRR2637682_1.fastq.bz2 PASS Per base N content SRR2637682_1.fastq.bz2 PASS Sequence Length Distribution SRR2637682_1.fastq.bz2 FAIL Sequence Duplication Levels SRR2637682_1.fastq.bz2 WARN Overrepresented sequences SRR2637682_1.fastq.bz2 PASS Adapter Content SRR2637682_1.fastq.bz2 FAIL Kmer Content SRR2637682_1.fastq.bz2
I've read about lots of quality control tools that can fix some of these problems. However, I cannot find one that works properly and generates a "PASS" for all of these.
For example, I have absolutely no idea how I can fix the "Kmer content" module, all I know is that it has always shown a FAIL in every real example I've seen.
All I can find are trimmers and adapter removers, which don't improve most of the modules here. For example, "Per base sequence content" I have no idea how to fix this, all I know is that it's always FAIL.
FastQC doesn't actually fix anything, how can I go about fixing all of these modules? are there some that okay to fail?