Question: How to repair *all* problems identified by FastQC?
gravatar for dec986
2.6 years ago by
United States
dec986230 wrote:


I am downloading public data, and am running FastQC on a number of FASTQ files I've downloaded. I get reports like this:

PASS    Basic Statistics    SRR2637682_1.fastq.bz2
PASS    Per base sequence quality   SRR2637682_1.fastq.bz2
PASS    Per tile sequence quality   SRR2637682_1.fastq.bz2
PASS    Per sequence quality scores SRR2637682_1.fastq.bz2
FAIL    Per base sequence content   SRR2637682_1.fastq.bz2
FAIL    Per sequence GC content SRR2637682_1.fastq.bz2
PASS    Per base N content  SRR2637682_1.fastq.bz2
PASS    Sequence Length Distribution    SRR2637682_1.fastq.bz2
FAIL    Sequence Duplication Levels SRR2637682_1.fastq.bz2
WARN    Overrepresented sequences   SRR2637682_1.fastq.bz2
PASS    Adapter Content SRR2637682_1.fastq.bz2
FAIL    Kmer Content    SRR2637682_1.fastq.bz2

I've read about lots of quality control tools that can fix some of these problems. However, I cannot find one that works properly and generates a "PASS" for all of these.

For example, I have absolutely no idea how I can fix the "Kmer content" module, all I know is that it has always shown a FAIL in every real example I've seen.

All I can find are trimmers and adapter removers, which don't improve most of the modules here. For example, "Per base sequence content" I have no idea how to fix this, all I know is that it's always FAIL.

FastQC doesn't actually fix anything, how can I go about fixing all of these modules? are there some that okay to fail?

fastqc rna-seq • 3.3k views
ADD COMMENTlink modified 2.6 years ago by Ian5.6k • written 2.6 years ago by dec986230

Some "problems" are not problems. For example:

  • FastQC will flag fail for most RNAseq libraries, because its assumption for fail is genomic library.
  • Illumina TruSeq RNAseq library will always flag fail for per base sequence content

You have to take FastQC warnings and fails with a grain of salt, taking into account the nature of the samples being analysed.

P.S.: added link for post discussing TruSeq hexamer priming problem.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by h.mon30k

Nextera genomic libraries also fail the "per base sequence content", at least they did a few years back.

I believe that was because of some residual transposase bias in the first 10-15 bp.

ADD REPLYlink written 2.6 years ago by Cliff Beall450

There are a lot of posts in Biostars about Fastqc For example:

Questions regarding proprocess for raw data and usage of FastQC

What's wrong with this sample? (kmers found by FastQC of RNA-Seq)

Understanding Fastqc Output- Please Help

GC content and Kmer


ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by natasha.sernova3.7k
gravatar for novice
2.6 years ago by
United States
novice970 wrote:

Easy: You download the tool and select option --no-more-worries.

Just kidding!

Yes, there are usually some warnings that you can ignore. Quality control is entirely based on your knowledge of the sequences and your purposes. In my opinion, people more often than not unnecessarily filter/trim and lose information.

ADD COMMENTlink written 2.6 years ago by novice970
gravatar for Ian
2.6 years ago by
University of Manchester, UK
Ian5.6k wrote:

A good way to solve the errors (taking into account what the other said about their relevance) is to run the reads through a trimming tool, such as Trimmomatic, cutadapt, etc. Not only will poor quality reads/bases be removed, but also adapters. Often rerunning fastqc will show a vast improvement.

Also, take a read of the excellent QCfail.

ADD COMMENTlink written 2.6 years ago by Ian5.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1688 users visited in the last hour