Question: FastQC with multiple FASTQ files
0
gravatar for m93
2.3 years ago by
m93240
m93240 wrote:

I have received 384 fastq.gz files. These come from paired-end sequencing so I have 2 files per patient so 192 patients. I am new to NGS data analysis and I wish to start using FastQC. What would be the best way to proceed?

  • I know FastQC can be run graphically but presumably, with that many samples, it would be best to use the command line..
  • I read some places that merging all samples into a single (or 2 with paired-end) files might be the solution. Is that recommended? Or should I just use simple bash scripting in like below (or something similar)?

for i in *fastqc.gz do bsub < fastqc_script_with_commands.sh done

I guess I'm just curious if there is a convention of merging fastq files or keeping them separate (1 or 2 per sample).

Thanks

fastqc multiple ngs • 9.7k views
ADD COMMENTlink modified 2.3 years ago by drkennetz500 • written 2.3 years ago by m93240
1

use gnu-parallel or snakemake.

ADD REPLYlink written 2.3 years ago by cpad011214k

or Nextflow. Examples of using FastQC inside a Nextflow pipeline here, here and here

ADD REPLYlink written 2.3 years ago by steve2.6k
3
gravatar for genomax
2.3 years ago by
genomax90k
United States
genomax90k wrote:

You would want to do the QC for files individually. When run on the command line with -o option FastQC will write the result files to that directory. A bash loop would work. You can look into MultiQC to aggregate all results.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by genomax90k

+1 for MultiQC, I dont even bother to look at the individual output metrics anymore

ADD REPLYlink written 2.3 years ago by steve2.6k
2
gravatar for drkennetz
2.3 years ago by
drkennetz500
drkennetz500 wrote:

fastqc.sh:

#!/usr/bin/env bash
RUN_PATH=$1
cd $RUN_PATH
for file in $(ls $RUN_PATH)
do
    SAMPLE=`basename $file`
    fastqc -t 5 ${SAMPLE} -o /path/to/where/you/want/outputs
done

$./fastqc.sh /path/to/fastqs/

If you are running this on a cluster, just add qsub or bsub before the fastqs line (we have ibm):

That line would become:

bsub -P project -q queuename -n 1 -R "rusage[mem=2000]" fastqc -t 5 ${SAMPLE} -o /path/to/where/you/want/outputs

change the queuename to the actual name of the queue you submit jobs to.

ADD COMMENTlink written 2.3 years ago by drkennetz500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1219 users visited in the last hour