Using snakemake to download split-reads by SRA number
1
1
Entering edit mode
2.2 years ago
Ivan ▴ 60

This is inspired by a comment on this post by /u/dariober:

Basically, you are asking snakemake to produce one log file per SRR id and this log file is produced by rule download_files. As a side product, download_files will give you the actual fastq files (things could be done differently in this respect but hopefully this will help...)

I have a list of SRA files that I need to download to split reads. The goal is to get from [SRR1, .....] a list of files SRR1_1.fastq, SRR1_2.fastq in a predetermined folder. To that extent, I wrote the following file:

SRA_MAPPING = read_dictionary()
SRAFILES = list(SRA_MAPPING.keys())[1:]

RawSampleFolderName="raw_samples" 
RawSampleFolder=RawSampleFolderName+"/"

rule all:
    input:
        expand("{RawSampleFolder}{srafiles}_1.fastq",srafiles=SRAFILES, RawSampleFolder=RawSampleFolder),
        expand("{RawSampleFolder}{srafiles}_2.fastq",srafiles=SRAFILES, RawSampleFolder=RawSampleFolder)

rule download_srafiles: 
    output:
        expand("{RawSampleFolder}{srafiles}_1.fastq",srafiles=SRAFILES, RawSampleFolder=RawSampleFolder),
        expand("{RawSampleFolder}{srafiles}_2.fastq",srafiles=SRAFILES, RawSampleFolder=RawSampleFolder)
    params:
        download_folder = RawSampleFolderName
    shell:
        "fasterq-dump {wildcards.srafiles} -O {params.download_folder}"

(This is a proof of concept, I'll dump the global variables in config as soon as I can). The nutshell is that I have a list of SRA files read from a list, and I have a preset download folder. I use fasterq-dump to get split read files - from SR1 to SR_1.fastq and SR_2.fastq, In a snakemake fashion, I'd like for the rule download_srafiles to have output to be fastq files. My previous solution was given in the linked posted, but said solution made a .log file as an output and retrieved files as a side effect - I'd like to skip the part where I need log file. Since all I have is a python list, I skip the input. Running the above file does not download samples. Instead I get the error:

'Wildcards' object has no attribute 'srafiles'

So what is it that I'm doing wrong?

snakemake sratoolkit • 1.5k views
ADD COMMENT
3
Entering edit mode
2.2 years ago
liorglic ★ 1.4k

The reason you are getting this error message is that you have no wildcards in your download_srafiles rule. When you use expand(), the function simply replaces the values enclosed in {}'s, so no wildcards remain. This is a bit confusing, but keep in mind that {}'s within expand means "variable to replace", whereas in normal inputs/outputs {}'s indicate wildcards.
In fact, you shouldn't use expand in your second rule - this is already taken care of by the rule all. So the code for the second rule should look something like:

rule download_srafiles: 
    output:
        "%s{srafiles}_1.fastq" % RawSampleFolder,
        "%s{srafiles}_2.fastq" % RawSampleFolder
    params:
        download_folder = RawSampleFolderName
    shell:
        "fasterq-dump {wildcards.srafiles} -O {params.download_folder}"

Finally, I am not familiar with fasterq-dump, but I highly recommend Kingfisher, which can also download from ENA which is much faster.

ADD COMMENT
0
Entering edit mode

This pretty much solved all my problems. I figured that expand does something funky to wildcards but couldn't figure out what. Many thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2299 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6