Entering edit mode
                    21 months ago
        Fadwa
        
    
        ▴
    
    10
    Hii
I am working with Snakemake to process a CSV file containing SRR IDs for downloading. In the initial rule, I use the SRA ID as a wildcard to fetch SRR files from NCBI. However, when I attempt to parallelize the job using the -j 2 option, the downloading step does not parallelize as expected. Can you please assist me with this issue?
home = os.path.expanduser("~")
fichier_csv = os.path.join(home, 'sra_list.csv')
SRA_LIST = []
with open(fichier_csv, 'rt') as f:
    for line in f:
        line = line.split()[0].strip()
        if re.match('[SED]RR\d+$', line): 
            SRA_LIST.append(line)
rule fetch_fastq:
    output:
        config["RESULTS"] + "Fastq_Files/{sra}.fastq.gz"
    log:
        config["RESULTS"] + "Supplementary_Data/Logs/{sra}.sratoolkit.log"
    benchmark:
        config["RESULTS"] + "Supplementary_Data/Benchmark/{sra}.sratoolkit.txt"
    message:
       "fetch fastq from NCBI"
    params:
       conda = "sratoolkit",
       outdir = config["RESULTS"] + "Fastq_Files"
    threads: 8
    shell:
        """
        set +eu &&
        . $(conda info --base)/etc/profile.d/conda.sh &&
        conda activate {params.conda}
        fastq-dump \
                --split-spot \
                --skip-technical {wildcards.sra} \
                --stdout 2>{log} \
        | gzip -c > {output}
        """
can you please help me to parallelize this ??
Do you have enough resources on the machine? You're requesting 8 threads for a single thread process.
Yes, i have enough resources. it's just a test
Try using
-j 16