I am practicing Snakemake. Wanted to make a MACS2 callpeak code, and ran into problems when I wanted to add a level of complexity (from making a simple code for the basic, narrow peak case, to trying to make one that is flexible to both broad and narrow scenarios). First, I did the following (basic narrow peak case, which works fine). (see first code below)
The problem is: when I want to add an if statement that changes the "narrowPeak" file for a "broadPeak" one if a configfile parameter says so ("peakcaller" is set to either narrow or broad), I run into the error "WildcardError in line (the one of the rule all). Wildcards in input files cannot be determined from output files: 'sample' ". Here I leave the second code: (see second code below).
I defined pkFile and pkFileAll because, as in the first code, the syntax is a bit different (in rule all you have to expand and put sample=samples, whereas in the rest you can just write {sample} without expanding). I thought that with defining such variables I would make it, but clearly I'm messing up somewhere. Any thoughts? I cannot seem to spot it. Thanks!
First code:
import pandas as pd
import numpy as np
# Read in config file parameters
configfile: 'config.yaml'
sampleFile = config['samples'] # 1st column: ChIP-treatment file basename; 2nd: input files with path
data = pd.read_csv(sampleFile, header=0, names=['sample', 'inputs']).set_index('sample', drop=False)
samples = data['sample'].unique().tolist() # sample names
inplist= data['inputs'].tolist() # the ChIP-input files
# Define output folders
testDir = "/data/Test" #dir with test data
logDir = "/myexperiment/Logs"
pcDir = "/myexperiment/Peaks"
# Target rule
rule all:
input:
expand(f'{pcDir}' + '/{sample}_peaks.xls', sample=samples),
expand(f'{pcDir}' + '/{sample}_peaks.narrowPeak',sample=samples)
# Peak calling with MACS2
rule macs2:
input: trt = f'{testDir}'+'/{sample}.bam', inp = expand('{inpfile}',inpfile=inplist)
output: xls = f'{pcDir}' + '/{sample}_peaks.xls' ,pk = f'{pcDir}' + '/{sample}_peaks.narrowPeak'
params: nm = '{sample}' ,
pcout = pcDir ,
gs = "hs" ,
th = 0.05
threads: 40
conda: "macs2.yaml"
log: f'{logDir}' + '/macs2_{sample}.log'
shell: "macs2 callpeak -t {input.trt} -c {input.inp} -f BAM -g {params.gs} -q {params.th} --outdir {params.pcout} -n {params.nm} 2> {log}"
Second code:
import pandas as pd
import numpy as np
# Read in config file parameters
configfile: 'config.yaml'
sampleFile = config['samples'] # 1st column: ChIP-treatment file basename; 2nd: input files with path
peakCaller = config['peakcaller']
data = pd.read_csv(sampleFile, header=0, names=['sample', 'inputs']).set_index('sample', drop=False)
samples = data['sample'].unique().tolist() # sample names
inplist= data['inputs'].tolist() # the ChIP-input files
# Define output folders
testDir = "/data/Test" #dir with test data
logDir = "/myexperiment/Logs"
pcDir = "/myexperiment/Peaks"
# Define peak calling files
if( peakCaller == "narrow" ):
pkFile = "'" + pcDir + "/{sample}_peaks.narrowPeak'"
pkFileAll = "expand('" + pcDir + "/{sample}_peaks.narrowPeak',sample=samples)"
if( peakCaller == "broad" ):
pkFile = "'" + pcDir + "/{sample}_peaks.broadPeak'"
pkFileAll = "expand('" + pcDir + "/{sample}_peaks.broadPeak',sample=samples)"
# Target rule
rule all:
input:
expand(f'{pcDir}' + '/{sample}_peaks.xls', sample=samples),
f'{pkFileAll}'
# Peak calling with MACS2
rule macs2:
input: trt = f'{testDir}'+'/{sample}.bam', inp = expand('{inpfile}',inpfile=inplist)
output: xls = f'{pcDir}' + '/{sample}_peaks.xls' ,pk = f'{pkFile}'
params: nm = '{sample}' ,
pcout = pcDir ,
gs = "hs" ,
th = 0.05
threads: 40
conda: "macs2.yaml"
log: f'{logDir}' + '/macs2_{sample}.log'
shell: "macs2 callpeak -t {input.trt} -c {input.inp} -f BAM -g {params.gs} -q {params.th} --outdir {params.pcout} -n {params.nm} 2> {log}"