How can I load numerous files from a config file in Snakemake? Is it worth it ?
10 weeks ago

Hello there!

A few days ago I started using Snakemake for the first time.

Mainly I want to use fasterq-dump to download a big number of files from NCBI and I do it like this:

sra = []

with open("run_ids") as f:
for line in f:
sra.append(line.strip())

rule all:
input:

output:
params:
"--split-spot --skip-technical"
log:
"logs/fasterq-dump/{sample}.log"
shell:
"""
"""


This is working, but:

1. How can I load the samples from a configure.yaml file instead. Now I have and external txt file with a list of samples and I read it with python
2. Is it worth it? Will make my script faster if I load the samples from a configure.yaml?

10 weeks ago
seidel 9.0k

Presumably you would put your sample names in the config.yaml file:

SAMPLES:
- "sample1"
- "sample2"
- "sample3"


and then reference it in your input:

configfile: "config.yml"

rule all:
input:


But I can't imagine it would have any effect on the speed of your process, as certainly python reading a txt file or a config file is not the slow part of a script. If you have a good text file method, that seems simpler than formatting your sample names for yaml in a config file. On the other hand, a config.yml file is more formally tied to a Snakemake file - so I suppose it's up to how you like to organize things.