Question: Do I have to check every fastqc to make good quality?
gravatar for khaeuk
4.3 years ago by
khaeuk80 wrote:

Hello everyone,

I'm pretty new to the field, so forgive me if this is a dumb question. So, I am working on mapping out single reads to the reference mouse genome, with given 260 fastq files. What I'd like to do is make all 260 files to be good quality reads. Right now it's a total mess, as in some have bad 'per base sequence quality' and 'per base sequence content', while some reads have bad 'per base GC content', and so on.

My original thought was run a program that fetches what passed and what failed from summary.txt from each fastqc directory. Then I'd have list of files that failed 'per base sequence content' or failed 'per base GC content' and such. However, because each reads are different within the list, I would still have to open html file and look individually to improve the quality. ( ex. I might have to trim 10 bp while other files need to trim more than that to improve the quality. )

Is there a way to somehow program it so that I can make all the files to be good quality reads at once, rather than checking each fastqc.html files myself and make different adjustments to each reads to improve the quality?

ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 20 • written 4.3 years ago by khaeuk80

+1 for MultiQC mentioned below. It can handle logs from several other applications (besides FastQC like aligners etc) and consolidate them in one location.

ADD REPLYlink written 4.3 years ago by genomax89k
gravatar for James Ashmore
4.3 years ago by
James Ashmore3.0k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore3.0k wrote:

Try multiqc, it collects all the fastqc reports into one overall report - extremely handy if you have lots of files.

ADD COMMENTlink written 4.3 years ago by James Ashmore3.0k
gravatar for ivivek_ngs
4.3 years ago by
Seattle,WA, USA
ivivek_ngs5.0k wrote:

I would ask to take a look at AfterQC . It can walk through all FASTQ files together in one folder and produce all sorts of QC plots in different sub-category folders.

Not each report but multi-report will produce a plot like this and other subsidiary plots for all the samples for your better understanding. This should be helpful.

ADD COMMENTlink written 4.3 years ago by ivivek_ngs5.0k
gravatar for WouterDeCoster
4.3 years ago by
WouterDeCoster44k wrote:

Don't worry, I don't think it's a dumb question. QC is important but just screening a bunch of htmls will make you crazy. I'm aware of a tool which enables you to perform QC in tabular format, but can't remember the name. Someone else will come along and help us both I guess.

But with regard to your solution, in which you propose to improve the quality per sample by trimming 10bp or more or less I really have to disagree. This is not the way to go. You have to treat all your samples equally and perform processing in a similar fashion, arbitrary making a judgment on treatment will create a huge bias and invalid results. Stick to one strategy. Perhaps optimize on a few samples. You also shouldn't look only to the effect on the (possibly flawed) fastQC plots, but test the effect of your trimming on the mapping metrics. (Exaggeration: trimming your reads to high quality reads of 5 nucleotides might have a nice effect on quality, but mapping will be a disaster.)

Hopefully your samples are not that heterogeneous that a common trimming strategy doesn't work for some, because that's already bad news for any downstream analysis. This would seem like a serious batch effect.

ADD COMMENTlink written 4.3 years ago by WouterDeCoster44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 867 users visited in the last hour