Question: concatenating FASTQ forward and reverse reads in Snakemake
gravatar for Hansen_869
4 weeks ago by
Hansen_8690 wrote:

I'm not sure this is the right place for this question, but please let me know if it's not.

I'm currently using Snakemake to annotate some genes. I have a config file, in which my fastq filenames are listed. It looks something like this:

  Sample A:
  Sample B:

This is just a simplification of the config file, as it is much longer, but the pattern is the same. Now, I want to concatenate SampleA R1 and SampleA R2 into one fq.gz file. The same goes for Sample B and so forth, if the config was longer. I guess I should be using wildcards, but I'm unsure about how to access the config file and how to use it with wildcards.

Thanks in advance!!

ADD COMMENTlink modified 4 weeks ago by Eric Lim1.4k • written 4 weeks ago by Hansen_8690
gravatar for Eric Lim
4 weeks ago by
Eric Lim1.4k
Stoke Therapeutics, Inc
Eric Lim1.4k wrote:

What have you tried? Understanding how wildcards work can be simple for some and time-consuming for others. I hope this simple example will put you in the right direction.

config = {
    'samples': {
        'a': {
            'r1': 'a_r1.txt',
            'r2': 'a_r2.txt'

    input: expand('{samples}_merge.txt', samples=config['samples'].keys())

rule merge:
    input: r1 = '{sample}_r1.txt',
           r2 = '{sample}_r2.txt'
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

In addition to the documentation, a good place to start learning about wildcards:

ADD COMMENTlink written 4 weeks ago by Eric Lim1.4k

Thanks for your reponse! I haven't really tried anything worth mentioning, as I'm pretty ignorant on the topic. I'll definitely give this a go and look at the link. Will your proposed code also work, if my config is in a different file? I use the:

configfile: "config.yaml"

in the top of the file.

ADD REPLYlink written 4 weeks ago by Hansen_8690

configfile will populate what's in the yaml into config. As to whether the code will work, there's only one way to find out.

ADD REPLYlink written 4 weeks ago by Eric Lim1.4k

So by this approach, the input doesn't access r1 and r2 of the config does it? It uses the key "a", but I specify in my input that it should end with _r1.txt og _r2.txt. Is there any way to not having to write this? Can I just fetch it directly from the config file? I failed to mention, that in my config file, the R1 and R2 key, also have paths like:

R1: FASTQ/SampleA_R1.fq.gz

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Hansen_8690

I generally dislike specifying the entire file path in config, but you can use a function or lambda to read those in. These techniques are all clearly described in the documentation, and I'd seriously recommend you spending some time with it.

Function as input files:

lambda: (Step 3)

rule merge:
    input: r1 = lambda wildcards: config['samples'][wildcards.sample]['r1'],
             r2 = lambda wildcards: config['samples'][wildcards.sample]['r2']
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

Unless you start sharing you efforts and code in this learning process, I'm afraid I won't be able to help you further.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Eric Lim1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2078 users visited in the last hour