Render Rmarkdown in nextflow
2
0
Entering edit mode
6 months ago
ATpoint 55k

Can someone enlighten me, I cannot get my head around rendering Rmarkdown scripts via nextflow:

#! /usr/bin/env nextflow

nextflow.enable.dsl=2

process renderRMD {

    input:
    path(rmd)

    output:
    path("script.html")
    path("out.txt")

    script:
    """

    ## this works fine and gets emitted to the work dir
    Rscript -e 'write.table(x=data.frame(A=1), file="out.txt")' 

    ## this causes an error
    script -e 'rmarkdown::render("${rmd}", output_file="script.html")'

    ## and also this:
    Rscript -e 'rmarkdown::render("${rmd}", output_file="$workDir/script.html")'

    """

} 

workflow { renderRMD( Channel.fromPath("${baseDir}/script.rmd") ) }

This always exits with error:

Caused by:
  Missing output file(s) `script.html` expected by process `renderRMD (1)`

The out.txt file gets properly emitted to the work directory, but this Rmarkdown html causes an error. The html in fact is created to the same directory as the script.rmd file is located in, but for a reason I do not understand it does not get emitted to the work dir, hence nextflow "does not know about it" and this raises the error about the missing file.

Let script.rmd just be:

---
title: "nf-render"
output:
  html_document: default
---

```{r,eval=TRUE,echo=TRUE}

library(ggplot2)
ggplot(cars, aes(speed,dist)) + geom_point()

```

Any ideas?

rmarkdown nextflow • 642 views
ADD COMMENT
0
Entering edit mode

do you get the file when running bash .command.sh by hand in the cache directory for this "process" ? any error message ?

ADD REPLY
0
Entering edit mode

Ah yes, forgot to add that: The html is being properly created, both running the nextflow command and the .command.sh itself (but with that error when running via nextflow), and only to baseDir where the Rmd script sits bit the work cache folder where I would like the html to be in is empty. The .command.sh runs without errors. I guess this is related to how the rendering engine outputs the html that I do not understand.

ADD REPLY
0
Entering edit mode

The html is being properly created,

so why do you get Missing output file(s) ?

ADD REPLY
0
Entering edit mode

That is the crux of that question :) The thing is that it is not created in the work cache dir, so nextflow does not "recognize" it as being created. If I add optional: true to the output: declaration it works fine, or just removing the output declaration at all works as well, but I still want this html report in the cache.

ADD REPLY
2
Entering edit mode
6 months ago

add something like ,output_dir=getwd(),... ?

ADD COMMENT
1
Entering edit mode

Thanks, this works out now as expected. Got a similar suggestion over at nf-core Slack, sorry I just missed your comment.

(...)
script:
"""
Rscript -e 'rmarkdown::render("${rmd}", output_file="script.html", output_dir = getwd())'
"""

This then emits the report to the work directory as intended.

Edit: I was told though that getwd() might not be the best choice though as it is not platform-agnostic in terms of resolving the file path, https://github.com/nf-core/rnaseq/pull/614

Will update the answer if I find a different solution, for now this works fine :)

ADD REPLY
1
Entering edit mode
4 months ago
Gregor Sturm ▴ 80

This is a known issue with symbolic links: https://github.com/rstudio/rmarkdown/issues/1508

You could either use stageInMode: 'copy', or manually copy the file to the work directory before executing the notebook.

script:
"""
cp -L ${rmd} notebook.Rmd
Rscript -e "rmarkdown::render('notebook.Rmd')"
"""
ADD COMMENT
0
Entering edit mode

I use that way too, it avoid the potential getwd() problem

ADD REPLY

Login before adding your answer.

Traffic: 2110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6