Question

Snakemake paralellize

0

Entering edit mode

21 months ago

Fadwa ▴ 10

Hii

I am working with Snakemake to process a CSV file containing SRR IDs for downloading. In the initial rule, I use the SRA ID as a wildcard to fetch SRR files from NCBI. However, when I attempt to parallelize the job using the -j 2 option, the downloading step does not parallelize as expected. Can you please assist me with this issue?


home = os.path.expanduser("~")
fichier_csv = os.path.join(home, 'sra_list.csv')

SRA_LIST = []
with open(fichier_csv, 'rt') as f:
    for line in f:
        line = line.split()[0].strip()
        if re.match('[SED]RR\d+$', line): 
            SRA_LIST.append(line)

rule fetch_fastq:
    output:
        config["RESULTS"] + "Fastq_Files/{sra}.fastq.gz"
    log:
        config["RESULTS"] + "Supplementary_Data/Logs/{sra}.sratoolkit.log"
    benchmark:
        config["RESULTS"] + "Supplementary_Data/Benchmark/{sra}.sratoolkit.txt"
    message:
       "fetch fastq from NCBI"
    params:
       conda = "sratoolkit",
       outdir = config["RESULTS"] + "Fastq_Files"
    threads: 8
    shell:
        """
        set +eu &&
        . $(conda info --base)/etc/profile.d/conda.sh &&
        conda activate {params.conda}
        fastq-dump \
                --split-spot \
                --skip-technical {wildcards.sra} \
                --stdout 2>{log} \
        | gzip -c > {output}
        """

can you please help me to parallelize this ??

snakemake order • 980 views

ADD COMMENT • link updated 21 months ago by Eric Lim ★ 2.2k • written 21 months ago by Fadwa ▴ 10

0

Entering edit mode

Do you have enough resources on the machine? You're requesting 8 threads for a single thread process.

ADD REPLY • link 21 months ago by Shred ★ 1.6k

0

Entering edit mode

Yes, i have enough resources. it's just a test

ADD REPLY • link 21 months ago by Fadwa ▴ 10

0

Entering edit mode

Try using -j 16

ADD REPLY • link 21 months ago by Eric Lim ★ 2.2k