Running a large number of fastq files through FastQC and other downstream analysis programmes for RNA-Seq
2
1
Entering edit mode
9.2 years ago
sangita_b ▴ 80

I have recently carried out RNA Seq (human airway epithelial cells) and have now received the data. Does anyone know if I can run/ queue fastq files when using FastQC?

To put things into context I have 20 samples that each have 24 fastq files.

Can anyone suggest/ advise on how I best manage these data?

Thanks
Sangita

RNA-Seq • 9.6k views
ADD COMMENT
2
Entering edit mode

If you have access to a cluster, you could use GNU parallel in combination with the multithreaded option to gain additional speed up.

ADD REPLY
3
Entering edit mode
9.2 years ago

From documentation:

fastqc seqfile1 seqfile2 .. seqfileN

So you can run it on all files at a time.

ADD COMMENT
4
Entering edit mode

As Geek_y already mentioned,you can you multiple files as input. Denote, that FastQC handles as many files simultaneously as many threads you have provided (--threads).

ADD REPLY
0
Entering edit mode

Thanks for that clarification! Saved me a heap of time.

ADD REPLY
3
Entering edit mode
9.2 years ago

My demo project ngsxml contains 6 fastq files. https://github.com/lindenb/ngsxml

A FastQC can be generated for each sample in parallel using make -j x

# 1 target
# 2 fastq files
define run_fastqc

$(1) : $(2) ${fastqc.exe}
    mkdir -p $$(dir $$@) && \
    cat $(2) > $$(addsuffix .tmp.gz,$$@) && \
    ${fastqc.exe}   \
        -o $$(dir $$@) \
        -j ${java.exe} \
         --format fastq  --noextract \
         $$(addsuffix .tmp.gz,$$@)  && \
    rm $$(addsuffix .tmp.gz,$$@) && \
    mv $$(addsuffix .tmp_fastqc.zip,$$@) $$@

endef

(...)
all_fastqc :  \
    $(call project_dir,Proj1)/Samples/NA12878/FASTQC/NA12878.for_fastq.zip \
    $(call project_dir,Proj1)/Samples/NA12878/FASTQC/NA12878.rev_fastq.zip \
    $(call project_dir,Proj1)/Samples/NA12891/FASTQC/NA12891.for_fastq.zip \
    $(call project_dir,Proj1)/Samples/NA12891/FASTQC/NA12891.rev_fastq.zip \
    $(call project_dir,Proj1)/Samples/NA12892/FASTQC/NA12892.for_fastq.zip \
    $(call project_dir,Proj1)/Samples/NA12892/FASTQC/NA12892.rev_fastq.zip
(...)

$(eval $(call run_fastqc,$(call project_dir,Proj1)/Samples/NA12878/FASTQC/NA12878.for_fastq.zip, test/fastq/NA12878_01_R1.fastq.gz test/fastq/NA12878_02_R1.fastq.gz))
$(eval $(call run_fastqc,$(call project_dir,Proj1)/Samples/NA12878/FASTQC/NA12878.rev_fastq.zip, test/fastq/NA12878_01_R2.fastq.gz test/fastq/NA12878_02_R2.fastq.gz))
ADD COMMENT

Login before adding your answer.

Traffic: 1059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6