FastQC on multiple paired end files
1
1
Entering edit mode
4.2 years ago
evelyn ▴ 230

Hello,

I want to run FastQC on multiple FastQ files using an array.

fastqc -o ${OUT_DIR}/${SAMPLE}.fastqc.out -f ${INPUT_DIR}/${SAMPLE}.fastq

However, I could not find how to run fastqc on paired end files. e.g., can one fastqc report be created for file_1.fastq and file_2.fastq. If not, what are the options? I have 10 paired end files. I was planning to run fastqc on each of those and get one file using MultiQC. Is there a better way to do this. Thank you!

RNA-Seq • 6.7k views
ADD COMMENT
2
Entering edit mode
4.2 years ago
Ram 43k

AFAIK fastqc does not have a PE mode as the metrics it calculates are file-specific. You can read this post for a previous discussion on this topic.

I'd go the route you're thinking of, where you run fastqc on each individual FASTQ file and then multiqc the reports. BTW, fastqc -t 8 will process 8 files in parallel, so you may wish to use the -t option to get the job done quicker.

ADD COMMENT
1
Entering edit mode

Thank you for the information. I ran fastqc on single file and it went well but it is giving an error if I use this array:

INPUT_DIR=/path/
OUT_DIR=/path/
RUN=${SLURM_ARRAY_TASK_ID}
INPUT=$(ls -1 $INPUT_DIR/*.fastq | sed -n ${RUN}p)
SAMPLE=$(basename ${INPUT} | sed 's/.fastq//')
fastqc -o ${OUT_DIR}/${SAMPLE}.fastqc.out -f ${INPUT_DIR}/${SAMPLE}.fastq

Error is: Specified output directory '' does not exist

ADD REPLY
0
Entering edit mode

-o should be an existing directory, I think. Add an mkdir -p ${OUT_DIR}/${SAMPLE}.fastqc.out and you'll be all set.

I'd recommend against per-sample output directories though, as fastqc outputs an HTML file and a zip file per FASTQ file, and multiqc needs all the output zips to be in the same directory. So, unless you wish to create a new dir and move/soft-link all zip files, go with -o $OUT_DIR

Also, you can just basename $INPUT .fastq and skip the sed.

ADD REPLY
0
Entering edit mode

Thank you for the help. I made the changes but it still shows some error:

Unrecognised sequence format 'file1_2.fastq', acceptable formats are bam,sam,bam_mapped,sam_mapped and fastq

These are fastq sequences for RNA seq from sra. I am not sure why it complaints about the format.

ADD REPLY
1
Entering edit mode

Please read the manual. Your command line is wrong. Usage is as follows:

fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN
ADD REPLY
0
Entering edit mode

I found a very simple way: instead of an array I just used fastqc -t 8 *.fastq -o /path/ and it worked. Thank you for the help!

ADD REPLY
0
Entering edit mode

Glad it worked. Can you see the error in your previous command line? You were using -f (the parameter that accepts data format) for the input files. fastqc does not have named parameters for input files, just positional parameters.

ADD REPLY
0
Entering edit mode

yes I got that. Thank you for the catch!

ADD REPLY

Login before adding your answer.

Traffic: 1338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6