Question: Do I have to check every fastqc to make good quality?
15 months ago by
Hello everyone,

I'm pretty new to the field, so forgive me if this is a dumb question. So, I am working on mapping out single reads to the reference mouse genome, with given 260 fastq files. What I'd like to do is make all 260 files to be good quality reads. Right now it's a total mess, as in some have bad 'per base sequence quality' and 'per base sequence content', while some reads have bad 'per base GC content', and so on.

My original thought was run a program that fetches what passed and what failed from summary.txt from each fastqc directory. Then I'd have list of files that failed 'per base sequence content' or failed 'per base GC content' and such. However, because each reads are different within the list, I would still have to open html file and look individually to improve the quality. ( ex. I might have to trim 10 bp while other files need to trim more than that to improve the quality. )

Is there a way to somehow program it so that I can make all the files to be good quality reads at once, rather than checking each fastqc.html files myself and make different adjustments to each reads to improve the quality?

+1 for MultiQC mentioned below. It can handle logs from several other applications (besides FastQC like aligners etc) and consolidate them in one location.

15 months ago by
James Ashmore2.0k
UK/Edinburgh/MRC Centre for Regenerative Medicine
Try multiqc, it collects all the fastqc reports into one overall report - extremely handy if you have lots of files.

Thank you so muchhhh!!!

15 months ago by
Don't worry, I don't think it's a dumb question. QC is important but just screening a bunch of htmls will make you crazy. I'm aware of a tool which enables you to perform QC in tabular format, but can't remember the name. Someone else will come along and help us both I guess.

But with regard to your solution, in which you propose to improve the quality per sample by trimming 10bp or more or less I really have to disagree. This is not the way to go. You have to treat all your samples equally and perform processing in a similar fashion, arbitrary making a judgment on treatment will create a huge bias and invalid results. Stick to one strategy. Perhaps optimize on a few samples. You also shouldn't look only to the effect on the (possibly flawed) fastQC plots, but test the effect of your trimming on the mapping metrics. (Exaggeration: trimming your reads to high quality reads of 5 nucleotides might have a nice effect on quality, but mapping will be a disaster.)

Hopefully your samples are not that heterogeneous that a common trimming strategy doesn't work for some, because that's already bad news for any downstream analysis. This would seem like a serious batch effect.

15 months ago by
Milan, Italy
I would ask to take a look at AfterQC . It can walk through all FASTQ files together in one folder and produce all sorts of QC plots in different sub-category folders.

Not each report but multi-report will produce a plot like this and other subsidiary plots for all the samples for your better understanding. This should be helpful.

