How do you use MultiQC on FastQC files taken from SRA files?
1
0
Entering edit mode
4.0 years ago
beneopp • 0

Hi. I am new to Galaxy and bioinformatics so feel free to let me know if there is a better way of asking my question. The files were executed in a way that the MultiQC did not recognize the files existed. I downloaded SRA files to Galaxy. These files were transfered to a Paired-end data folder with each accession (sample) inside along with the forward and reverse fastq files. This general file format remains after running these Files through FastQC. Now, I am trying to run the files through MultiQC but the data is not recognized. Does anyone have suggestions on how to fix this?

galaxy MultiQC FastQC SRA Fastq-dump • 6.1k views
0
Entering edit mode

This question may be best asked on Galaxy support page: https://help.galaxyproject.org/

That said. Are you pointing MultiQC to FastQC results file(s)? Here is what the multiQC help says:

The FastQC MultiQC module looks for files called fastqc_data.txt or ending in _fastqc.zip. If the zip files are found, they are read in memory and fastqc_data.txt parsed.

0
Entering edit mode

Any workaround to this?, I am trying the same....

Basically, want to translate the following script's lines to CWL:

$fastqc *.fastq.gz -o${WORKDIR}/fastqc_analysis
$cd${WORKDIR}/fastqc_analysis
\$ multiqc .


For that, i have written 2 tools (fastqc.cwl and multiqc.cwl) and got a workflow that uses that tools, the problem here is "fastqc tool", this tool read all files inside a directory and process it using scatter, and output a list of files.

How can I link the directory that contains all outputs to the next step (multiqc)?, I have tried adding the "working dir" (outputBinding: {glob: "."}) in the fastqc tool but since it is a scatter method it returns an array of directories and not a single dir.

There exists another way to run fastqc over a group of files that doesn't involve scatter?.

fatqc.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: fastqc tool

requirements:
- class: DockerRequirement
dockerPull: fastqc:0.11.8

baseCommand: [fastqc, --outdir, .]

inputs:
fastqcFile:
type: File
inputBinding:
position: 3
type: int
inputBinding:
position: 4
prefix: "-t"
separate: true

outputs:
resultFiles:
type:
type: array
items: File
outputBinding:
glob: "*"

resultDir:
type: Directory
outputBinding:
glob: "."


multiqc.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: fastqc tool

requirements:
- class: DockerRequirement
dockerPull: ewels/multiqc:1.7

baseCommand: multiqc

inputs:
multiqcInputDir:
type: Directory
inputBinding:
position: 1

outputs:
report:
type: File
outputBinding:
glob: "multiqc_report.html"
type: Directory
outputBinding:
glob: "multiqc_data"


workflow.cwl This is not finished, just draft.

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
- class: InlineJavascriptRequirement
- class: ScatterFeatureRequirement
- class: StepInputExpressionRequirement

inputs:
fastqcSourceDir: Directory

outputs:
report:
type: File
outputSource: [multiqc/report]
type: Directory

steps:
in:
dir: fastqcSourceDir
out: [files]
fastqc:
run: fastqc.cwl
scatter: fastqcFile
in:
out: [resultFiles, resultDir]
multiqc:
run: multiqc.cwl
in:
multiqcInputDir: fastqc/resultDir

0
Entering edit mode

Please do not ask questions in existing threads. Open a new question. You may consider posting this in the new support forum for CWL which has moved away from Biostars recently, see CWL user support moving to https://cwl.discourse.group/; many thanks to Biostars for over 4 years of support!

1
Entering edit mode
4.0 years ago
Phil Ewels ▴ 990

See the MultiQC documentation about how MultiQC finds input files here: https://multiqc.info/docs/#module-search-patterns

In short, as @genomax says in the comment above, MultiQC uses the default FastQC zip and data filenames as a search pattern. But these can be customised if required. As to how best to do this within Galaxy, I'm not sure I'm afraid.