Question: Snakemake: Force execution of rules in a specific order - ignoring wildcards
0
gravatar for camerond
10 days ago by
camerond150
camerond150 wrote:

I am running an ATAC-seq pipeline that include rules for LD score regression and Partitioned heritability and I need the execution order of these rules to be sequential.

The ldsr rule runs fine, but I'm having issue with the partitioned_heritability rule as I need it to run after all the files in the ldsr rule have been generated.

In normal circumstances, I use the rules.ldsr.output function (labelled #1 in the Snakefile) to inform snakemake about rule execution order, as described here, but this is not possible here as the {chr} wildcard associated with the lsdr output is not included in the output of the partitioned_heritability rule, so the {chr} wildcard cannot be backfilled in the partitioned_heritability rule input . This means I get the error:

Wildcards in input files cannot be determined from output files: 'chr'

So I tried to use a flag file in the output of the ldsr rule and then call this in the input of the partitioned_heritability rule , see here for info on flag files, but this threw this error:

positional argument follows keyword argument

I thought this was probs because I was using named and unnamed inputs in the input/output calls, so also tried applying variable names to the ldsr outputs, i.e. touch_file = touch("mytask.done") and calling rules.ldsr.output.touch_file in the partitioned_heritability input but got the same error.

Also tried using check_call with the flag file, see #3, but that didn't work either. Tbh, I have never used the flag_file or the check_call options before so may not have understood, or be using, them properly.

Here are the files:

Snakefile:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
    input:
        expand("LD_annotation_files/Bulk.{chr}.l2.ldscore.gz", chr=range(1,23)),
        expand("partHerit_files2/PrtHrt_Test.{GWASsumstat}", GWASsumstat = config['SUMSTATS'])

rule ldsr:
    input:
        annot = "LD_annotation_files/Bulk.{chr}.annot.gz",
        bfile_folder = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles",
        snps_folder = "ldsc/reference_files/hapmap3_snps"
    output:
        "LD_annotation_files/Bulk.{chr}.l2.ldscore.gz",
#2      touch("mytask.done") # Required to enforce rule execution order i.e. PH after lsdr?
    conda:
        "envs/environment.yml"
    params:
        bfile = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles/1000G.EUR.QC.{chr}",
        ldscores = "LD_annotation_files/ManuBulk.{chr}",
        snps = "ldsc/reference_files/w_hm3.snplist_rsIds"
    log:
        "logs/LDSC/Bulk.{chr}_ldsc.txt"
    shell:
        "export MKL_NUM_THREADS=2;" # Export arguments are  workaround as ldsr uses all available cores
        "export NUMEXPR_NUM_THREADS=2;" # Numbers must match cores parameter in cluster config
        "Reference/ldsc/ldsc.py --l2 --bfile {params.bfile} --ld-wind-cm 1 "
        "--annot {input.annot} --out {params.ldscores} --print-snps {params.snps} 2> {log}"

#3 check_call(["snakemake", "--snakefile", "mytask.done"])

rule partitioned_heritability:
    input:
        GWASsumstat = lambda wildcards: config['SUMSTATS'][wildcards.GWASsumstat],
#1        LDSR = rules.ldsr.output
#2        "mytask.done"
    output:
        "partHerit_files2/PrtHrt_Test.{GWASsumstat}"
    conda:
        "envs/environment.yml"
    params:
        weights = "ldsc/reference_files/weights_hm3_no_hla/weights.",
        baseline = "ldsc/reference_files/baselineLD_v2.2_1000G_Phase3/baselineLD.",
        frqfile = "ldsc/reference_files/1000G_Phase3_frq/1000G.EUR.QC.",
        LD_anns = "LD_annotation_files/Bulk."
    log:
        "logs/PrtHerit/Bulk.PrtHrt.{GWASsumstat}.txt"
    shell:
        "python ldsc/ldsc.py --h2 {input.GWASsumstat} --w-ld-chr {params.weights}"
        "--ref-ld-chr {params.baseline},{params.LD_anns} --overlap-annot"
        "--frqfile-chr {params.frqfile} --out {output} --print-coefficients"
        -coefficients"

Config File:

SUMSTATS:
    ADHD: GWAS_sumstats/munged_sumstats/adhd_LDSC.sumstats.gz
    SCZ: GWAS_sumstats/munged_sumstats/scz_LDSC.sumstats.gz
    MDD: GWAS_sumstats/munged_sumstats/mdd_LDSC.sumstats.gz
    ASD: GWAS_sumstats/munged_sumstats/asd_LDSC.sumstats.gz
    BIP: GWAS_sumstats/munged_sumstats/bip_LDSC.sumstats.gz

Any help/advice/solutions with this would be greatly appreciated.

ADD COMMENTlink modified 10 days ago by Eric Lim1.6k • written 10 days ago by camerond150
3
gravatar for Eric Lim
10 days ago by
Eric Lim1.6k
Stoke Therapeutics, Inc
Eric Lim1.6k wrote:

Use LDSR = expand(rules.ldsr.output, chr=range(1,23)) in your partitioned_heritability rule to resolve wildcard {chr} and to ensure the rule can only be started after ldsr processes all chromosomes.

ADD COMMENTlink modified 10 days ago • written 10 days ago by Eric Lim1.6k

@EricLim - Many Thanks.

ADD REPLYlink written 7 days ago by camerond150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1508 users visited in the last hour