Question

FastQC with multiple FASTQ files

0

Entering edit mode

7.1 years ago

m98 ▴ 440

I have received 384 fastq.gz files. These come from paired-end sequencing so I have 2 files per patient so 192 patients. I am new to NGS data analysis and I wish to start using FastQC. What would be the best way to proceed?

I know FastQC can be run graphically but presumably, with that many samples, it would be best to use the command line..
I read some places that merging all samples into a single (or 2 with paired-end) files might be the solution. Is that recommended? Or should I just use simple bash scripting in like below (or something similar)?

for i in *fastqc.gz do bsub < fastqc_script_with_commands.sh done

I guess I'm just curious if there is a convention of merging fastq files or keeping them separate (1 or 2 per sample).

Thanks

ngs fastqc multiple • 33k views

ADD COMMENT • link updated 22 months ago by Ram 45k • written 7.1 years ago by m98 ▴ 440

2

Entering edit mode

use gnu-parallel or snakemake.

ADD REPLY • link 7.1 years ago by cpad0112 21k

0

Entering edit mode

or Nextflow. Examples of using FastQC inside a Nextflow pipeline here, here and here

ADD REPLY • link 7.1 years ago by steve ★ 3.5k

Ram · Answer 1 · 2018-06-25

fastqc.sh:

#!/usr/bin/env bash
RUN_PATH=$1
cd $RUN_PATH
for file in $(ls $RUN_PATH)
do
    SAMPLE=`basename $file`
    fastqc -t 5 ${SAMPLE} -o /path/to/where/you/want/outputs
done

$./fastqc.sh /path/to/fastqs/

If you are running this on a cluster, just add qsub or bsub before the fastqs line (we have ibm):

That line would become:

bsub -P project -q queuename -n 1 -R "rusage[mem=2000]" fastqc -t 5 ${SAMPLE} -o /path/to/where/you/want/outputs

Change the queuename to the actual name of the queue you submit jobs to.

score 3 · Answer 2 · 2018-06-25

3

Entering edit mode

7.1 years ago

GenoMax 152k

You would want to do the QC for files individually. When run on the command line with -o option FastQC will write the result files to that directory. A bash loop would work. You can look into MultiQC to aggregate all results.

ADD COMMENT • link 7.1 years ago by GenoMax 152k

2

Entering edit mode

+1 for MultiQC, I dont even bother to look at the individual output metrics anymore

ADD REPLY • link 7.1 years ago by steve ★ 3.5k