Question: concatenating FASTQ forward and reverse reads in Snakemake
gravatar for Hansen_869
8 months ago by
Hansen_86920 wrote:

I'm not sure this is the right place for this question, but please let me know if it's not.

I'm currently using Snakemake to annotate some genes. I have a config file, in which my fastq filenames are listed. It looks something like this:

  Sample A:
  Sample B:

This is just a simplification of the config file, as it is much longer, but the pattern is the same. Now, I want to concatenate SampleA R1 and SampleA R2 into one fq.gz file. The same goes for Sample B and so forth, if the config was longer. I guess I should be using wildcards, but I'm unsure about how to access the config file and how to use it with wildcards.

Thanks in advance!!

ADD COMMENTlink modified 8 months ago by Eric Lim1.7k • written 8 months ago by Hansen_86920
gravatar for Eric Lim
8 months ago by
Eric Lim1.7k
Stoke Therapeutics, Inc
Eric Lim1.7k wrote:

What have you tried? Understanding how wildcards work can be simple for some and time-consuming for others. I hope this simple example will put you in the right direction.

config = {
    'samples': {
        'a': {
            'r1': 'a_r1.txt',
            'r2': 'a_r2.txt'

    input: expand('{samples}_merge.txt', samples=config['samples'].keys())

rule merge:
    input: r1 = '{sample}_r1.txt',
           r2 = '{sample}_r2.txt'
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

In addition to the documentation, a good place to start learning about wildcards:

ADD COMMENTlink written 8 months ago by Eric Lim1.7k

Thanks for your reponse! I haven't really tried anything worth mentioning, as I'm pretty ignorant on the topic. I'll definitely give this a go and look at the link. Will your proposed code also work, if my config is in a different file? I use the:

configfile: "config.yaml"

in the top of the file.

ADD REPLYlink written 8 months ago by Hansen_86920

configfile will populate what's in the yaml into config. As to whether the code will work, there's only one way to find out.

ADD REPLYlink written 8 months ago by Eric Lim1.7k

So by this approach, the input doesn't access r1 and r2 of the config does it? It uses the key "a", but I specify in my input that it should end with _r1.txt og _r2.txt. Is there any way to not having to write this? Can I just fetch it directly from the config file? I failed to mention, that in my config file, the R1 and R2 key, also have paths like:

R1: FASTQ/SampleA_R1.fq.gz

ADD REPLYlink modified 8 months ago • written 8 months ago by Hansen_86920

I generally dislike specifying the entire file path in config, but you can use a function or lambda to read those in. These techniques are all clearly described in the documentation, and I'd seriously recommend you spending some time with it.

Function as input files:

lambda: (Step 3)

rule merge:
    input: r1 = lambda wildcards: config['samples'][wildcards.sample]['r1'],
             r2 = lambda wildcards: config['samples'][wildcards.sample]['r2']
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

Unless you start sharing you efforts and code in this learning process, I'm afraid I won't be able to help you further.

ADD REPLYlink modified 8 months ago • written 8 months ago by Eric Lim1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour