Hello,
I want to run FastQC on multiple FastQ files using an array.
fastqc -o ${OUT_DIR}/${SAMPLE}.fastqc.out -f ${INPUT_DIR}/${SAMPLE}.fastq
However, I could not find how to run fastqc on paired end files. e.g., can one fastqc report be created for file_1.fastq and file_2.fastq. If not, what are the options? I have 10 paired end files. I was planning to run fastqc on each of those and get one file using MultiQC. Is there a better way to do this. Thank you!
Thank you for the information. I ran fastqc on single file and it went well but it is giving an error if I use this array:
Error is:
Specified output directory '' does not exist
-o
should be an existing directory, I think. Add anmkdir -p ${OUT_DIR}/${SAMPLE}.fastqc.out
and you'll be all set.I'd recommend against per-sample output directories though, as fastqc outputs an HTML file and a zip file per FASTQ file, and multiqc needs all the output zips to be in the same directory. So, unless you wish to create a new dir and move/soft-link all zip files, go with
-o $OUT_DIR
Also, you can just
basename $INPUT .fastq
and skip thesed
.Thank you for the help. I made the changes but it still shows some error:
These are fastq sequences for RNA seq from sra. I am not sure why it complaints about the format.
Please read the manual. Your command line is wrong. Usage is as follows:
I found a very simple way: instead of an array I just used
fastqc -t 8 *.fastq -o /path/
and it worked. Thank you for the help!Glad it worked. Can you see the error in your previous command line? You were using
-f
(the parameter that accepts data format) for the input files. fastqc does not have named parameters for input files, just positional parameters.yes I got that. Thank you for the catch!