Question: Do I have to check every fastqc to make good quality?
gravatar for khaeuk
15 months ago by
khaeuk60 wrote:

Hello everyone,

I'm pretty new to the field, so forgive me if this is a dumb question. So, I am working on mapping out single reads to the reference mouse genome, with given 260 fastq files. What I'd like to do is make all 260 files to be good quality reads. Right now it's a total mess, as in some have bad 'per base sequence quality' and 'per base sequence content', while some reads have bad 'per base GC content', and so on.

My original thought was run a program that fetches what passed and what failed from summary.txt from each fastqc directory. Then I'd have list of files that failed 'per base sequence content' or failed 'per base GC content' and such. However, because each reads are different within the list, I would still have to open html file and look individually to improve the quality. ( ex. I might have to trim 10 bp while other files need to trim more than that to improve the quality. )

Is there a way to somehow program it so that I can make all the files to be good quality reads at once, rather than checking each fastqc.html files myself and make different adjustments to each reads to improve the quality?

ADD COMMENTlink modified 15 months ago by vchris_ngs4.0k • written 15 months ago by khaeuk60

+1 for MultiQC mentioned below. It can handle logs from several other applications (besides FastQC like aligners etc) and consolidate them in one location.

ADD REPLYlink written 15 months ago by genomax33k
gravatar for James Ashmore
15 months ago by
James Ashmore2.0k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.0k wrote:

Try multiqc, it collects all the fastqc reports into one overall report - extremely handy if you have lots of files.

ADD COMMENTlink written 15 months ago by James Ashmore2.0k

Thank you so muchhhh!!!

ADD REPLYlink written 15 months ago by khaeuk60
gravatar for WouterDeCoster
15 months ago by
WouterDeCoster21k wrote:

Don't worry, I don't think it's a dumb question. QC is important but just screening a bunch of htmls will make you crazy. I'm aware of a tool which enables you to perform QC in tabular format, but can't remember the name. Someone else will come along and help us both I guess.

But with regard to your solution, in which you propose to improve the quality per sample by trimming 10bp or more or less I really have to disagree. This is not the way to go. You have to treat all your samples equally and perform processing in a similar fashion, arbitrary making a judgment on treatment will create a huge bias and invalid results. Stick to one strategy. Perhaps optimize on a few samples. You also shouldn't look only to the effect on the (possibly flawed) fastQC plots, but test the effect of your trimming on the mapping metrics. (Exaggeration: trimming your reads to high quality reads of 5 nucleotides might have a nice effect on quality, but mapping will be a disaster.)

Hopefully your samples are not that heterogeneous that a common trimming strategy doesn't work for some, because that's already bad news for any downstream analysis. This would seem like a serious batch effect.

ADD COMMENTlink written 15 months ago by WouterDeCoster21k
gravatar for vchris_ngs
15 months ago by
Milan, Italy
vchris_ngs4.0k wrote:

I would ask to take a look at AfterQC . It can walk through all FASTQ files together in one folder and produce all sorts of QC plots in different sub-category folders.

Not each report but multi-report will produce a plot like this and other subsidiary plots for all the samples for your better understanding. This should be helpful.

ADD COMMENTlink written 15 months ago by vchris_ngs4.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 880 users visited in the last hour