Question

FastQC on multiple paired end files

1

Entering edit mode

4.2 years ago

evelyn ▴ 230

Hello,

I want to run FastQC on multiple FastQ files using an array.

fastqc -o ${OUT_DIR}/${SAMPLE}.fastqc.out -f ${INPUT_DIR}/${SAMPLE}.fastq

However, I could not find how to run fastqc on paired end files. e.g., can one fastqc report be created for file_1.fastq and file_2.fastq. If not, what are the options? I have 10 paired end files. I was planning to run fastqc on each of those and get one file using MultiQC. Is there a better way to do this. Thank you!

RNA-Seq • 6.7k views

ADD COMMENT • link 4.2 years ago by evelyn ▴ 230

score 2 · Answer 1 · 2020-03-05

2

Entering edit mode

4.2 years ago

Ram 43k

AFAIK fastqc does not have a PE mode as the metrics it calculates are file-specific. You can read this post for a previous discussion on this topic.

I'd go the route you're thinking of, where you run fastqc on each individual FASTQ file and then multiqc the reports. BTW, fastqc -t 8 will process 8 files in parallel, so you may wish to use the -t option to get the job done quicker.

ADD COMMENT • link 4.2 years ago by Ram 43k

1

Entering edit mode

Thank you for the information. I ran fastqc on single file and it went well but it is giving an error if I use this array:

INPUT_DIR=/path/
OUT_DIR=/path/
RUN=${SLURM_ARRAY_TASK_ID}
INPUT=$(ls -1 $INPUT_DIR/*.fastq | sed -n ${RUN}p)
SAMPLE=$(basename ${INPUT} | sed 's/.fastq//')
fastqc -o ${OUT_DIR}/${SAMPLE}.fastqc.out -f ${INPUT_DIR}/${SAMPLE}.fastq

Error is: Specified output directory '' does not exist

ADD REPLY • link 4.2 years ago by evelyn ▴ 230

0

Entering edit mode

-o should be an existing directory, I think. Add an mkdir -p ${OUT_DIR}/${SAMPLE}.fastqc.out and you'll be all set.

I'd recommend against per-sample output directories though, as fastqc outputs an HTML file and a zip file per FASTQ file, and multiqc needs all the output zips to be in the same directory. So, unless you wish to create a new dir and move/soft-link all zip files, go with -o $OUT_DIR

Also, you can just basename $INPUT .fastq and skip the sed.

ADD REPLY • link 4.2 years ago by Ram 43k

0

Entering edit mode

Thank you for the help. I made the changes but it still shows some error:

Unrecognised sequence format 'file1_2.fastq', acceptable formats are bam,sam,bam_mapped,sam_mapped and fastq

These are fastq sequences for RNA seq from sra. I am not sure why it complaints about the format.

ADD REPLY • link 4.2 years ago by evelyn ▴ 230

1

Entering edit mode

Please read the manual. Your command line is wrong. Usage is as follows:

fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN

ADD REPLY • link 4.2 years ago by Ram 43k

0

Entering edit mode

I found a very simple way: instead of an array I just used fastqc -t 8 *.fastq -o /path/ and it worked. Thank you for the help!

ADD REPLY • link 4.2 years ago by evelyn ▴ 230

0

Entering edit mode

Glad it worked. Can you see the error in your previous command line? You were using -f (the parameter that accepts data format) for the input files. fastqc does not have named parameters for input files, just positional parameters.