Question: How to repair *all* problems identified by FastQC?
0
gravatar for dec986
6 weeks ago by
dec98630
United States
dec98630 wrote:

hello,

I am downloading public data, and am running FastQC on a number of FASTQ files I've downloaded. I get reports like this:

PASS    Basic Statistics    SRR2637682_1.fastq.bz2
PASS    Per base sequence quality   SRR2637682_1.fastq.bz2
PASS    Per tile sequence quality   SRR2637682_1.fastq.bz2
PASS    Per sequence quality scores SRR2637682_1.fastq.bz2
FAIL    Per base sequence content   SRR2637682_1.fastq.bz2
FAIL    Per sequence GC content SRR2637682_1.fastq.bz2
PASS    Per base N content  SRR2637682_1.fastq.bz2
PASS    Sequence Length Distribution    SRR2637682_1.fastq.bz2
FAIL    Sequence Duplication Levels SRR2637682_1.fastq.bz2
WARN    Overrepresented sequences   SRR2637682_1.fastq.bz2
PASS    Adapter Content SRR2637682_1.fastq.bz2
FAIL    Kmer Content    SRR2637682_1.fastq.bz2

I've read about lots of quality control tools that can fix some of these problems. However, I cannot find one that works properly and generates a "PASS" for all of these.

For example, I have absolutely no idea how I can fix the "Kmer content" module, all I know is that it has always shown a FAIL in every real example I've seen.

All I can find are trimmers and adapter removers, which don't improve most of the modules here. For example, "Per base sequence content" I have no idea how to fix this, all I know is that it's always FAIL.

FastQC doesn't actually fix anything, how can I go about fixing all of these modules? are there some that okay to fail?

fastqc rna-seq • 297 views
ADD COMMENTlink modified 5 weeks ago by Ian5.1k • written 6 weeks ago by dec98630
3

Some "problems" are not problems. For example:

  • FastQC will flag fail for most RNAseq libraries, because its assumption for fail is genomic library.
  • Illumina TruSeq RNAseq library will always flag fail for per base sequence content

You have to take FastQC warnings and fails with a grain of salt, taking into account the nature of the samples being analysed.

P.S.: added link for post discussing TruSeq hexamer priming problem.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by h.mon10k
1

Nextera genomic libraries also fail the "per base sequence content", at least they did a few years back.

I believe that was because of some residual transposase bias in the first 10-15 bp.

ADD REPLYlink written 5 weeks ago by Cliff Beall450
1

There are a lot of posts in Biostars about Fastqc For example:

Questions regarding proprocess for raw data and usage of FastQC

What's wrong with this sample? (kmers found by FastQC of RNA-Seq)

Understanding Fastqc Output- Please Help

GC content and Kmer

etc

ADD REPLYlink modified 5 weeks ago • written 6 weeks ago by natasha.sernova2.8k
6
gravatar for novice
6 weeks ago by
novice790
United States
novice790 wrote:

Easy: You download the tool FixReadsForGood.pl and select option --no-more-worries.

Just kidding!

Yes, there are usually some warnings that you can ignore. Quality control is entirely based on your knowledge of the sequences and your purposes. In my opinion, people more often than not unnecessarily filter/trim and lose information.

ADD COMMENTlink written 6 weeks ago by novice790
1
gravatar for Ian
5 weeks ago by
Ian5.1k
University of Manchester, UK
Ian5.1k wrote:

A good way to solve the errors (taking into account what the other said about their relevance) is to run the reads through a trimming tool, such as Trimmomatic, cutadapt, etc. Not only will poor quality reads/bases be removed, but also adapters. Often rerunning fastqc will show a vast improvement.

Also, take a read of the excellent QCfail.

ADD COMMENTlink written 5 weeks ago by Ian5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 617 users visited in the last hour