How do you use MultiQC on FastQC files taken from SRA files?
1
0
Entering edit mode
4.0 years ago
beneopp • 0

Hi. I am new to Galaxy and bioinformatics so feel free to let me know if there is a better way of asking my question. The files were executed in a way that the MultiQC did not recognize the files existed. I downloaded SRA files to Galaxy. These files were transfered to a Paired-end data folder with each accession (sample) inside along with the forward and reverse fastq files. This general file format remains after running these Files through FastQC. Now, I am trying to run the files through MultiQC but the data is not recognized. Does anyone have suggestions on how to fix this?

galaxy MultiQC FastQC SRA Fastq-dump • 6.1k views
ADD COMMENT
0
Entering edit mode

This question may be best asked on Galaxy support page: https://help.galaxyproject.org/

That said. Are you pointing MultiQC to FastQC results file(s)? Here is what the multiQC help says:

The FastQC MultiQC module looks for files called fastqc_data.txt or ending in _fastqc.zip. If the zip files are found, they are read in memory and fastqc_data.txt parsed.

ADD REPLY
0
Entering edit mode

Any workaround to this?, I am trying the same....

Basically, want to translate the following script's lines to CWL:

$ fastqc *.fastq.gz -o ${WORKDIR}/fastqc_analysis
$ cd ${WORKDIR}/fastqc_analysis
$ multiqc .

For that, i have written 2 tools (fastqc.cwl and multiqc.cwl) and got a workflow that uses that tools, the problem here is "fastqc tool", this tool read all files inside a directory and process it using scatter, and output a list of files.

How can I link the directory that contains all outputs to the next step (multiqc)?, I have tried adding the "working dir" (outputBinding: {glob: "."}) in the fastqc tool but since it is a scatter method it returns an array of directories and not a single dir.

There exists another way to run fastqc over a group of files that doesn't involve scatter?.

fatqc.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: fastqc tool

requirements:
  - class: DockerRequirement
    dockerPull: fastqc:0.11.8

baseCommand: [fastqc, --outdir, .]

inputs:
  fastqcFile:
    type: File
    inputBinding:
      position: 3
  threads:
    type: int
    inputBinding:
      position: 4
      prefix: "-t"
      separate: true

outputs:
    resultFiles:
      type:
        type: array
        items: File
      outputBinding:
        glob: "*"

    resultDir:
      type: Directory
      outputBinding:
        glob: "."

multiqc.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
label: fastqc tool

requirements:
  - class: DockerRequirement
    dockerPull: ewels/multiqc:1.7

baseCommand: multiqc

inputs:
  multiqcInputDir:
    type: Directory
    inputBinding:
      position: 1

outputs:
  report:
    type: File
    outputBinding:
      glob: "multiqc_report.html"
  metadata:
    type: Directory
    outputBinding:
      glob: "multiqc_data"

workflow.cwl This is not finished, just draft.

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
  - class: InlineJavascriptRequirement
  - class: ScatterFeatureRequirement
  - class: StepInputExpressionRequirement

inputs:
  fastqcSourceDir: Directory
  fastqcThreads: int

outputs:
  report:
    type: File
    outputSource: [multiqc/report]
  metadata:
    type: Directory
    outputSource: [multiqc/metadata]

steps:
  readDir:
    run: readDirFiles.cwl
    in:
      dir: fastqcSourceDir
    out: [files]
  fastqc:
    run: fastqc.cwl
    scatter: fastqcFile
    in:
      fastqcFile: readDir/files
      threads: fastqcThreads
    out: [resultFiles, resultDir]
  multiqc:
    run: multiqc.cwl
    in:
      multiqcInputDir: fastqc/resultDir
    out: [report, metadata]
ADD REPLY
0
Entering edit mode

Please do not ask questions in existing threads. Open a new question. You may consider posting this in the new support forum for CWL which has moved away from Biostars recently, see CWL user support moving to https://cwl.discourse.group/; many thanks to Biostars for over 4 years of support!

ADD REPLY
1
Entering edit mode
4.0 years ago
Phil Ewels ▴ 990

See the MultiQC documentation about how MultiQC finds input files here: https://multiqc.info/docs/#module-search-patterns

In short, as @genomax says in the comment above, MultiQC uses the default FastQC zip and data filenames as a search pattern. But these can be customised if required. As to how best to do this within Galaxy, I'm not sure I'm afraid.

ADD COMMENT

Login before adding your answer.

Traffic: 1460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6