Question: problem with missing input file in snakemake rule
0
gravatar for Assa Yeroslaviz
8 weeks ago by
Assa Yeroslaviz1.2k
Munich
Assa Yeroslaviz1.2k wrote:

I'm trying to convert the SJ.tab.out file from STAR into a bed file within a snakemake script.

the rule for that is this:

 rule map_star_SE:
    input:
        R1='data/samples/single-end/{sample}_R1.fastq',
        index=directory("data/starIndex/")
    output:
        bam='Analysis/star/bamFiles/{sample}.bam',
    params:
        prefix = 'Analysis/star/bamFiles/{sample}',
        starlogs = 'Analysis/starlogs',
        gz_support=gz_command
    threads: 16
    shell:
        r'''
        mkdir -p {params.prefix} && \
        STAR --runThreadN {threads} \
             --genomeDir {input.index} \
             --outFileNamePrefix {params.prefix} --readFilesIn {input.R1} {params.gz_support} \
             --outSAMtype BAM SortedByCoordinate \
             --limitBAMsortRAM {config[RAM]} \
             --quantMode GeneCounts \
             --outReadsUnmapped Fastx &&\
        mv {params.prefix}Aligned.sortedByCoord.out.bam {output.bam} &&\
        mkdir -p {params.starlogs} &&\
        mv {params.prefix}Log.final.out {params.prefix}Log.out {params.prefix}Log.progress.out {params.starlogs}
        '''

rule index:
    input:
        'Analysis/star/bamFiles/{sample}.bam'
    output:
        'Analysis/star/bamFiles/{sample}.bam.bai'
    message:
        "Indexing {wildcards.sample}.sorted.bam"
    benchmark:
        "Analysis/benchmarks/{sample}.index_bam.txt"
    shell:
        'samtools index {input}'

   rule align_SJ2Bed:
        input:
            "Analysis/star/bamFiles/{sample}SJ.out.tab"
        output:
            "Analysis/star/bamFiles/{sample}.junctions.bed"
         shell:
            "scripts/STAR_SJtab2JunctionsBed.py -f {input} > {output}"

The global rule_all is

rule all:
   input:
      directory("data/starIndex/"),
      expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.bam.bai", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.junctions.bed", sample=SAMPLES)

when I run this analysis from scratch I get the error message with the last part of it.

$ snakemake -ps workflow.Snakefile -j 30 --force

Building DAG of jobs...

MissingInputException in line 46 of /local/Assa/projects/automation/01.Mapping/Star.Mapping_SE.Snakefile:

Missing input files for rule align_SJ2Bed:

Analysis/star/bamFiles/1_S1SJ.out.tab

But When I run the rule_all with the last output command commented out, it runs all the way with no errors. When I than activate the last row of the rule_all part and re-run the script it works for this part as well. Why can't snakemake see that the input file I would like to create in the rule_align_SJ2Bed is created in the a previous step, namely in rule_map_star_PE.

thanks

Assa

mapping star rule snakemake • 202 views
ADD COMMENTlink written 8 weeks ago by Assa Yeroslaviz1.2k
1
gravatar for Sej Modha
8 weeks ago by
Sej Modha4.5k
Glasgow, UK
Sej Modha4.5k wrote:

My guess is that the only output that can be found from rule map_star_SE is a bam file as you have specified. If you are saving a *.out.tab as part of that rule then you'd have to explicitly specify that in the rule map_star_SE output section.

ADD COMMENTlink written 8 weeks ago by Sej Modha4.5k

This make sense. I'll try it. thanks

ADD REPLYlink written 8 weeks ago by Assa Yeroslaviz1.2k

yes, this worked fine. Thanks for that. I still don't understand this kind of error though.

I have added another rule now for the quantification using featureCounts.

rule quant:
    input:
        gtf = "data/genome.gtf",
        bam = expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES)
    output:
        Gene = "Analysis/quantification/GeneCounts.txt",
    message:
        "running featureCounts"
    threads: 15
    log:
        log  = "Analysis/quantification/featureCounts.stat"
    shell:
        " featureCounts -T {threads} -a {input.gtf} -t exon -g gene_id -M\
        -o {output.Gene} {input.bam} >>  {log.log} 2>&1"

When I'm running the rule by its own with snakemake -nps Quantification.Snakefile -j 30 it runs perfectly well.

But when I try to add it to my rule all

rule all:
   input:
      directory("data/starIndex/"),
      expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.counts.tab", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.bam.bai", sample=SAMPLES),
      expand("Analysis/star/bamFiles/{sample}.junctions.bed", sample=SAMPLES),
      "Analysis/quantification/GeneCounts.txt"

it throws back the same error as above

$ snakemake -nps workflow.Snakefile -j 30 --forceall
Building DAG of jobs...
MissingInputException in line 12 of /local/Assa/projects/automation/01.Mapping/workflow.Snakefile:
Missing input files for rule all:
Analysis/quantification/GeneCounts.txt

But I do have the input file for this using the bam parameters. What is the difference, if it is asked for on the rule all or within the rule quant itself?

ADD REPLYlink written 8 weeks ago by Assa Yeroslaviz1.2k

I am not sure why you need rule all for in this case as you are saving the output from each bam file to a single file.

ADD REPLYlink written 8 weeks ago by Sej Modha4.5k

I'm not sure how else I can do that.

How can I call for a rule with wildcards in the output?

e.g. How can I run the rule index by itself?

ADD REPLYlink written 8 weeks ago by Assa Yeroslaviz1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1759 users visited in the last hour