absolute path for symbolic links in Snakefile
1
0
Entering edit mode
4 weeks ago
yifangt86 ▴ 60

Hello,

I have a problem to with the relative path of the input/output files in my Snakefile rule. which is to create softlinks of output files in the same directory.

The workdir is defined in the configuration and the directory structure is like:

/store/proj/hapmap/John/bowtie2/
                                 |
                                 |-configs/config.yaml
                                 |-rules/
                                 |       |-rule01.smk
                                 |       |-...
                                 |       |-rule04.smk
                                 |-data/seqs_trimmed/
                                                    |-CN295_R1_val_1.fq.gz       
                                                    |-CN295_R2_val_2.fq.gz       
                                                    |-......
 #Snakefile is rule04.smk:                 #/store/proj/hapmap/John/bowtie2/rules

configfile: "../configs/config.yaml"      #/store/proj/hapmap/John/bowtie2/configs
workdir: config['dir_project']            #/store/proj/hapmap/John/bowtie2
DIR_OUT = "data/seqs_trimmed/"

rule softlink_PE:
    input:
        R1= DIR_OUT + '{sample}_R1_val_1.fq.gz',
        R2= DIR_OUT + '{sample}_R2_val_2.fq.gz'
    output:
        R1= DIR_OUT + '{sample}_trimmed_PE_R1.fq.gz',
        R2= DIR_OUT + '{sample}_trimmed_PE_R2.fq.gz'
    shell:
        """ 
        ln -s {input.R1} {output.R1}
        ln -s {input.R2} {output.R2}
        """

The softlinks were successfully created but pointing to the wrong source file because of the relative path:

CN295_trimmed_PE_R1.fq.gz -> data/seqs_trimmed/CN295_R1_val_1.fq.gz
CN295_trimmed_PE_R2.fq.gz -> data/seqs_trimmed/CN295_R2_val_2.fq.gz

which should be pointing the ones located in the same folder. i.e.

CN295_trimmed_PE_R1.fq.gz -> CN295_R1_val_1.fq.gz
CN295_trimmed_PE_R2.fq.gz -> CN295_R2_val_2.fq.gz

From Snakemake manual FAQ it reads:

Relative paths in Snakemake are interpreted depending on their context.

Input, output, log, and benchmark files are considered to be relative to the working directory (either the directory in which you have invoked Snakemake or whatever was specified for --directory or the workdir: directive).

But it is still unclear to me to resolve the issue. I am confused about the relative path in Snakemake, although I am aware softlink with shell command can be tricky for the relative path. Any idea to correct this Snakefile problem is appreciated.

Snakemake • 458 views
ADD COMMENT
1
Entering edit mode
23 days ago
Jesse ▴ 770

It's nothing to do with Snakemake, just the ordinary confusion of making relative symlinks when your working directory is somewhere else. If you know in your situation that it'll always be within the same directory, you could just use ln -s $(basename {input.R1}) {output.R1} (and likewise for R2) so the symlinks points to targets in the same directory as the symlinks.

ADD COMMENT
2
Entering edit mode

Further to this answer - if you are using GNU coreutils (ie. any modern Linux), there is a "-r" flag to fix this problem. I typically use "ln -snrf" when using "ln" in scripts (including snakefiles).

If you are on Mac you have the more basic BSD version of "ln", but you can install the GNU version via Homebrew. Or probably (I've not checked) you can install it via conda, which would make life simple if you are already installing Snakemake as a conda package.

ADD REPLY
0
Entering edit mode

That's really handy, and much more of a general solution than stripping off the directory names! So yifangt86, if you're on Linux you can probably just ln -sr {input.R1} {output.R1} and call it a day.

ADD REPLY

Login before adding your answer.

Traffic: 2364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6