I would like to use snakemake to analyze my data sets. As I am going to work with different organisms, I would like snakemake to create a folder for each of them when indexing the genome.
let's say I would like to work with human and mouse data.
In my config.yaml
file I have the following snippet:
organism:
"Dmel"
# "Dpse"
Each time I am working with an organism I can comment out the others in the list.
Below are the rules for getting the fastA files as well as the annotation and the indexing step (as an example) with STAR.
#### get reference genomic data for the mapping (fasta and gtf files). Links are added in the config file
rule get_genome:
output:
fastA="genome/genome.fa",
gtf="genome/genome.gtf"
shell:
"""
wget -nc -O - {config[fastA]} | gunzip -c - > {output.fastA}
wget -nc -O - {config[gtf]} | gunzip -c - > {output.gtf}
"""
### Indexing the reference genome
rule star_index:
input:
fasta="genome/genome.fa",
gtf="genome/genome.gtf"
output:
directory("genome/starIndex/")
threads: 16
params:
prefix = {config["organism"]}
shell:
"mkdir -p {output} && "
"STAR --runThreadN {threads} "
"--outFileNamePrefix {output}{params.prefix} "
"--runMode genomeGenerate "
"--genomeDir {output} "
"--limitGenomeGenerateRAM {config[RAM]} "
"--genomeSAindexNbases {config[SAindex]} "
"--genomeFastaFiles {input.fasta} "
"--sjdbGTFfile {input.gtf} "
"--sjdbOverhang 100"
I would like to know how to add the value from the config file in the organism, e.g. config[organism]
into the rules, so that, when I download the fastA and gtf they will be renamed accordingly.
I have tried to add config[organism]
to the path, but it gives me an error. I have also tried this for example:
rule get_genome:
output:
fastA="genome/{config[organism]}.fa",
gtf="genome/{config[organism]}.gtf"
But it just creates the files {config[organism]}.gtf
and {config[organism]}.fa
.
Can someone please help me understand how I can add the parameter from the config file to the different paths in the various rules?
thanks