I'm trying to convert the SJ.tab.out file from STAR into a bed file within a snakemake script.
the rule for that is this:
rule map_star_SE:
input:
R1='data/samples/single-end/{sample}_R1.fastq',
index=directory("data/starIndex/")
output:
bam='Analysis/star/bamFiles/{sample}.bam',
params:
prefix = 'Analysis/star/bamFiles/{sample}',
starlogs = 'Analysis/starlogs',
gz_support=gz_command
threads: 16
shell:
r'''
mkdir -p {params.prefix} && \
STAR --runThreadN {threads} \
--genomeDir {input.index} \
--outFileNamePrefix {params.prefix} --readFilesIn {input.R1} {params.gz_support} \
--outSAMtype BAM SortedByCoordinate \
--limitBAMsortRAM {config[RAM]} \
--quantMode GeneCounts \
--outReadsUnmapped Fastx &&\
mv {params.prefix}Aligned.sortedByCoord.out.bam {output.bam} &&\
mkdir -p {params.starlogs} &&\
mv {params.prefix}Log.final.out {params.prefix}Log.out {params.prefix}Log.progress.out {params.starlogs}
'''
rule index:
input:
'Analysis/star/bamFiles/{sample}.bam'
output:
'Analysis/star/bamFiles/{sample}.bam.bai'
message:
"Indexing {wildcards.sample}.sorted.bam"
benchmark:
"Analysis/benchmarks/{sample}.index_bam.txt"
shell:
'samtools index {input}'
rule align_SJ2Bed:
input:
"Analysis/star/bamFiles/{sample}SJ.out.tab"
output:
"Analysis/star/bamFiles/{sample}.junctions.bed"
shell:
"scripts/STAR_SJtab2JunctionsBed.py -f {input} > {output}"
The global rule_all is
rule all:
input:
directory("data/starIndex/"),
expand("Analysis/star/bamFiles/{sample}.bam", sample=SAMPLES),
expand("Analysis/star/bamFiles/{sample}.bam.bai", sample=SAMPLES),
expand("Analysis/star/bamFiles/{sample}.junctions.bed", sample=SAMPLES)
when I run this analysis from scratch I get the error message with the last part of it.
$ snakemake -ps workflow.Snakefile -j 30 --force
Building DAG of jobs...
MissingInputException in line 46 of /local/Assa/projects/automation/01.Mapping/Star.Mapping_SE.Snakefile:
Missing input files for rule align_SJ2Bed:
Analysis/star/bamFiles/1_S1SJ.out.tab
But When I run the rule_all with the last output command commented out, it runs all the way with no errors. When I than activate the last row of the rule_all part and re-run the script it works for this part as well.
Why can't snakemake see that the input file I would like to create in the rule_align_SJ2Bed is created in the a previous step, namely in rule_map_star_PE.
thanks
Assa
This make sense. I'll try it. thanks
yes, this worked fine. Thanks for that. I still don't understand this kind of error though.
I have added another rule now for the quantification using featureCounts.
When I'm running the rule by its own with
snakemake -nps Quantification.Snakefile -j 30it runs perfectly well.But when I try to add it to my
rule allit throws back the same error as above
But I do have the input file for this using the
bamparameters. What is the difference, if it is asked for on therule allor within therule quantitself?I am not sure why you need
rule allfor in this case as you are saving the output from eachbamfile to a single file.I'm not sure how else I can do that.
How can I call for a rule with wildcards in the output?
e.g. How can I run the
rule indexby itself?