Snakemake: Force execution of rules in a specific order - ignoring wildcards
1
0
Entering edit mode
2.5 years ago
camerond ▴ 170

I am running an ATAC-seq pipeline that include rules for LD score regression and Partitioned heritability and I need the execution order of these rules to be sequential.

The ldsr rule runs fine, but I'm having issue with the partitioned_heritability rule as I need it to run after all the files in the ldsr rule have been generated.

In normal circumstances, I use the rules.ldsr.output function (labelled #1 in the Snakefile) to inform snakemake about rule execution order, as described here, but this is not possible here as the {chr} wildcard associated with the lsdr output is not included in the output of the partitioned_heritability rule, so the {chr} wildcard cannot be backfilled in the partitioned_heritability rule input . This means I get the error:

Wildcards in input files cannot be determined from output files: 'chr'


So I tried to use a flag file in the output of the ldsr rule and then call this in the input of the partitioned_heritability rule , see here for info on flag files, but this threw this error:

positional argument follows keyword argument


I thought this was probs because I was using named and unnamed inputs in the input/output calls, so also tried applying variable names to the ldsr outputs, i.e. touch_file = touch("mytask.done") and calling rules.ldsr.output.touch_file in the partitioned_heritability input but got the same error.

Also tried using check_call with the flag file, see #3, but that didn't work either. Tbh, I have never used the flag_file or the check_call options before so may not have understood, or be using, them properly.

Here are the files:

Snakefile:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
input:
expand("LD_annotation_files/Bulk.{chr}.l2.ldscore.gz", chr=range(1,23)),
expand("partHerit_files2/PrtHrt_Test.{GWASsumstat}", GWASsumstat = config['SUMSTATS'])

rule ldsr:
input:
annot = "LD_annotation_files/Bulk.{chr}.annot.gz",
snps_folder = "ldsc/reference_files/hapmap3_snps"
output:
"LD_annotation_files/Bulk.{chr}.l2.ldscore.gz",
#2      touch("mytask.done") # Required to enforce rule execution order i.e. PH after lsdr?
conda:
"envs/environment.yml"
params:
ldscores = "LD_annotation_files/ManuBulk.{chr}",
snps = "ldsc/reference_files/w_hm3.snplist_rsIds"
log:
"logs/LDSC/Bulk.{chr}_ldsc.txt"
shell:
"export MKL_NUM_THREADS=2;" # Export arguments are  workaround as ldsr uses all available cores
"export NUMEXPR_NUM_THREADS=2;" # Numbers must match cores parameter in cluster config
"Reference/ldsc/ldsc.py --l2 --bfile {params.bfile} --ld-wind-cm 1 "
"--annot {input.annot} --out {params.ldscores} --print-snps {params.snps} 2> {log}"

rule partitioned_heritability:
input:
GWASsumstat = lambda wildcards: config['SUMSTATS'][wildcards.GWASsumstat],
#1        LDSR = rules.ldsr.output
output:
"partHerit_files2/PrtHrt_Test.{GWASsumstat}"
conda:
"envs/environment.yml"
params:
weights = "ldsc/reference_files/weights_hm3_no_hla/weights.",
baseline = "ldsc/reference_files/baselineLD_v2.2_1000G_Phase3/baselineLD.",
frqfile = "ldsc/reference_files/1000G_Phase3_frq/1000G.EUR.QC.",
LD_anns = "LD_annotation_files/Bulk."
log:
"logs/PrtHerit/Bulk.PrtHrt.{GWASsumstat}.txt"
shell:
"python ldsc/ldsc.py --h2 {input.GWASsumstat} --w-ld-chr {params.weights}"
"--ref-ld-chr {params.baseline},{params.LD_anns} --overlap-annot"
"--frqfile-chr {params.frqfile} --out {output} --print-coefficients"
-coefficients"


Config File:

SUMSTATS:
SCZ: GWAS_sumstats/munged_sumstats/scz_LDSC.sumstats.gz
MDD: GWAS_sumstats/munged_sumstats/mdd_LDSC.sumstats.gz
ASD: GWAS_sumstats/munged_sumstats/asd_LDSC.sumstats.gz
BIP: GWAS_sumstats/munged_sumstats/bip_LDSC.sumstats.gz


Any help/advice/solutions with this would be greatly appreciated.

snakemake python rule execution • 3.1k views
3
Entering edit mode
2.5 years ago
Eric Lim ★ 1.9k

Use LDSR = expand(rules.ldsr.output, chr=range(1,23)) in your partitioned_heritability rule to resolve wildcard {chr} and to ensure the rule can only be started after ldsr processes all chromosomes.

0
Entering edit mode

@EricLim - Many Thanks.