Running multiQC on more than one file Log.final.out from STAR
3
0
Entering edit mode
5.8 years ago
cristian ▴ 320

Hello, I have a folder called ~/pooled that contains 12 folders with 12 different names, e.g. bristol, altadena, hermanville, taunton. Each of these 12 folders contains a Log.final.out file output from STAR containing alignment statistics. I would like to run the multiQC software on all 12 log files in this way: multiqc output/alignment/reference/star/pooled/*/Log.final.out (1) or in this way: multiqc --file-list output/alignment/reference/star/pooled/pathsToLogFinalOut.txt (2) The file pathsToLogFinalOut.txt contains the paths to each of the log files. However, in both cases only one of the log files is analysed. The first command (1) only outputs the results for hermanville. The second command (2) only outputs results for taunton.

It's odd because the HTML file shows that the report was generated using data in:

Report generated on 2016-10-14, 14:10 based on data in:

~/pooled/bristol/Log.final.out
~/pooled/taunton/Log.final.out
~/pooled/hermanville/Log.final.out


Thanks.

multiqc star • 4.5k views
0
Entering edit mode

Direct to the folder that storing your Log.final.out files, and simply do ls | grep "Log.final.out" | multiqc .

0
Entering edit mode

Did you test that? It still searches . so the entire \$(pwd), no?

2
Entering edit mode
5.8 years ago
cristian ▴ 320

The problem has been resolved by my colleague Konrad: The following command works but the files must be all in the same directory (TEST) and all end with Log.final.out, e.g. OneLog.final.out and TwoLog.final.out:

multiqc --outdir output/alignment/reference/star/pooled/ -n multiqc output/alignment/reference/star/pooled/TEST/ -f

Thanks for trying to help me.

2
Entering edit mode
5.7 years ago
Phil Ewels ▴ 920

Hi Cristian,

Author of MultiQC here. The problem here is that STAR has an option to prefix log files with a sample name, giving sample.Log.final.out. By default, MultiQC takes this prefix as a sample name. When I wrote the module I didn't realise that if this wasn't surprised, you _just_ get Log.final.out. This means that these samples then have an empty sample name. When MultiQC finds multiple samples with the same name, they are overwritten - as all of your samples have an empty sample name, you'll only get a single result in the report. If you run in verbose mode (-v) you'll see warnings about this happening.

Someone else has also reported this issue (see here) - in the next release of MultiQC (next couple of weeks), the directory name will be taken if the filename is just Log.final.out, so your problem will be fixed.

In the mean time, this is a very easy problem to fix - using the -d option to prefix directory names, meaning that the samples won't be overwritten any more. You can find more about this at the documentation here: troubleshooting.

Also, not t hat MultiQC has been designed to run on directories, rather than files. You can give it a list of files, but it's usually easier to just give it a directory to search: multiqc output/

I hope this helps! Sorry that you've been having problems.

Phil

0
Entering edit mode
5.8 years ago
GZ1995 ▴ 410
find output/alignment/reference/star/pooled/ -name "Log.final.out" | xargs multiqc

0
Entering edit mode

Thanks eldronzhou. Unfortunately, I have the same problem. The multiqc_sources.txt file contains this: Module Section Sample Name Source STAR all_sections /Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/TEST/Log.final.out

Even though in the HTML file, I have this:

/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/altadena/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/amares/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/auckland/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/bristol/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/dundonald/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/hermanville/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/lakeforestpark/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/montevideo/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/paloalto/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/roxel/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/saltlakecity/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/taunton/Log.final.out
/Users/Documents/rotations/X/project/output/alignment/reference/star/pooled/TEST/Log.final.out

0
Entering edit mode

Do you mean that in html report the output is correct (13 log files all analyzed) but it's not correct in source file? And what's in multiqc running log file?

0
Entering edit mode

The output is not correct in the HTML report because it lists all the files analysed but it only show one single graph. That's the output of the command: [INFO ] multiqc : This is MultiQC v0.8 [INFO ] multiqc : Template : default [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//altadena/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//amares/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//auckland/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//bristol/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//dundonald/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//hermanville/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//lakeforestpark/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//montevideo/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//paloalto/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//roxel/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//saltlakecity/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//taunton/Log.final.out' [INFO ] multiqc : Searching 'output/alignment/reference/star/pooled//TEST/Log.final.out' [INFO ] star : Found 1 reports [INFO ] multiqc : Report : multiqc_report.html [INFO ] multiqc : Data : multiqc_data [INFO ] multiqc : MultiQC complete

0
Entering edit mode

It seems that multiqc only recognizes TEST/Log.final.out as formal STAR output. Can you check whether there is any discrepancy between TEST/Log.final.out and others to make sure those failed log files aren't corrupted ? This script tells how multiqc works to parse STAR output.

0
Entering edit mode

These logs are showing the search paths that you've given, not the files that it has found / parsed. The problem is described in my other reply below - MultiQC is finding your files with no problem, but overwriting them each time as they have the same filename.