Question: Generalizing snakemake MACS2 code - WildcardError Wildcards in input files cannot be determined from output files: 'sample'
0
gravatar for msimmer92
7 weeks ago by
msimmer92250
Uruguay
msimmer92250 wrote:

I am practicing Snakemake. Wanted to make a MACS2 callpeak code, and ran into problems when I wanted to add a level of complexity (from making a simple code for the basic, narrow peak case, to trying to make one that is flexible to both broad and narrow scenarios). First, I did the following (basic narrow peak case, which works fine). (see first code below)

The problem is: when I want to add an if statement that changes the "narrowPeak" file for a "broadPeak" one if a configfile parameter says so ("peakcaller" is set to either narrow or broad), I run into the error "WildcardError in line (the one of the rule all). Wildcards in input files cannot be determined from output files: 'sample' ". Here I leave the second code: (see second code below).

I defined pkFile and pkFileAll because, as in the first code, the syntax is a bit different (in rule all you have to expand and put sample=samples, whereas in the rest you can just write {sample} without expanding). I thought that with defining such variables I would make it, but clearly I'm messing up somewhere. Any thoughts? I cannot seem to spot it. Thanks!


First code:

import pandas as pd
import numpy as np

# Read in config file parameters
configfile: 'config.yaml'
sampleFile = config['samples'] # 1st column: ChIP-treatment file basename; 2nd: input files with path

data = pd.read_csv(sampleFile, header=0, names=['sample', 'inputs']).set_index('sample', drop=False)
samples = data['sample'].unique().tolist() # sample names
inplist= data['inputs'].tolist() # the ChIP-input files

# Define output folders
testDir = "/data/Test" #dir with test data
logDir = "/myexperiment/Logs"
pcDir = "/myexperiment/Peaks"

# Target rule
rule all:
    input:
        expand(f'{pcDir}' + '/{sample}_peaks.xls', sample=samples),
        expand(f'{pcDir}' + '/{sample}_peaks.narrowPeak',sample=samples) 

# Peak calling with MACS2
rule macs2:
    input: trt = f'{testDir}'+'/{sample}.bam', inp = expand('{inpfile}',inpfile=inplist)
    output: xls = f'{pcDir}' + '/{sample}_peaks.xls' ,pk = f'{pcDir}' + '/{sample}_peaks.narrowPeak'
    params: nm = '{sample}' ,
        pcout = pcDir , 
        gs = "hs" ,
        th = 0.05
    threads: 40
    conda: "macs2.yaml"
    log: f'{logDir}' + '/macs2_{sample}.log'
    shell: "macs2 callpeak -t {input.trt} -c {input.inp} -f BAM -g {params.gs} -q {params.th} --outdir {params.pcout} -n {params.nm} 2> {log}"

Second code:

import pandas as pd
import numpy as np

# Read in config file parameters
configfile: 'config.yaml'
sampleFile = config['samples'] # 1st column: ChIP-treatment file basename; 2nd: input files with path
peakCaller = config['peakcaller'] 

data = pd.read_csv(sampleFile, header=0, names=['sample', 'inputs']).set_index('sample', drop=False)
samples = data['sample'].unique().tolist() # sample names   
inplist= data['inputs'].tolist() # the ChIP-input files

# Define output folders
testDir = "/data/Test" #dir with test data
logDir =  "/myexperiment/Logs"
pcDir =  "/myexperiment/Peaks"

# Define peak calling files
if( peakCaller == "narrow" ):
    pkFile = "'" + pcDir + "/{sample}_peaks.narrowPeak'"
    pkFileAll = "expand('" + pcDir + "/{sample}_peaks.narrowPeak',sample=samples)"
if( peakCaller == "broad" ):
    pkFile = "'" + pcDir + "/{sample}_peaks.broadPeak'"
    pkFileAll = "expand('" + pcDir + "/{sample}_peaks.broadPeak',sample=samples)"

# Target rule
rule all:
    input:
        expand(f'{pcDir}' + '/{sample}_peaks.xls', sample=samples),
        f'{pkFileAll}'

# Peak calling with MACS2
rule macs2:
    input: trt = f'{testDir}'+'/{sample}.bam', inp = expand('{inpfile}',inpfile=inplist)
    output: xls = f'{pcDir}' + '/{sample}_peaks.xls' ,pk = f'{pkFile}'
    params: nm = '{sample}' ,
        pcout = pcDir , 
        gs = "hs" ,
        th = 0.05
    threads: 40
    conda: "macs2.yaml"
    log: f'{logDir}' + '/macs2_{sample}.log'
    shell: "macs2 callpeak -t {input.trt} -c {input.inp} -f BAM -g {params.gs} -q {params.th} --outdir {params.pcout} -n {params.nm} 2> {log}"
snakemake • 180 views
ADD COMMENTlink modified 10 days ago by Jeremy Leipzig19k • written 7 weeks ago by msimmer92250
0
gravatar for Jeremy Leipzig
10 days ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

First, the "{sample}" wildcards don't exist outside of rules. You can pass wildcards to functions (def myfunction(wildcardsGoHereReadTheDocs)) but that is for functions where the input depends on the wildcards.

Second, you're making this more complicated than it has to be. peakcaller does exist within rules so just use it there - pk = pcDir + "/{sample}_peaks."+peakCaller+"Peak'"

ADD COMMENTlink modified 10 days ago • written 10 days ago by Jeremy Leipzig19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 835 users visited in the last hour