Question

concatenating FASTQ forward and reverse reads in Snakemake

0

Entering edit mode

4.6 years ago

Hansen_869 ▴ 80

I'm not sure this is the right place for this question, but please let me know if it's not.

I'm currently using Snakemake to annotate some genes. I have a config file, in which my fastq filenames are listed. It looks something like this:

Samples:
  Sample A:
    R1: 
      SampleA_R1.fq.gz
    R2: 
      SampleA_R2.fq.gz
  Sample B:
    R1: 
      SampleB_R1.fq.gz
    R2: 
      SampleB_R2.fq.gz

This is just a simplification of the config file, as it is much longer, but the pattern is the same. Now, I want to concatenate SampleA R1 and SampleA R2 into one fq.gz file. The same goes for Sample B and so forth, if the config was longer. I guess I should be using wildcards, but I'm unsure about how to access the config file and how to use it with wildcards.

Thanks in advance!!

FASTQ Concatenate bash forward reads • 2.5k views

ADD COMMENT • link updated 4.6 years ago by Eric Lim ★ 2.1k • written 4.6 years ago by Hansen_869 ▴ 80

score 0 · Answer 1 · 2019-09-19

0

Entering edit mode

4.6 years ago

Eric Lim ★ 2.1k

What have you tried? Understanding how wildcards work can be simple for some and time-consuming for others. I hope this simple example will put you in the right direction.

config = {
    'samples': {
        'a': {
            'r1': 'a_r1.txt',
            'r2': 'a_r2.txt'
        }
    }
}

rule:
    input: expand('{samples}_merge.txt', samples=config['samples'].keys())

rule merge:
    input: r1 = '{sample}_r1.txt',
           r2 = '{sample}_r2.txt'
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

In addition to the documentation, a good place to start learning about wildcards: https://github.com/leipzig/SandwichesWithSnakemake

ADD COMMENT • link 4.6 years ago by Eric Lim ★ 2.1k

0

Entering edit mode

Thanks for your reponse! I haven't really tried anything worth mentioning, as I'm pretty ignorant on the topic. I'll definitely give this a go and look at the link. Will your proposed code also work, if my config is in a different file? I use the:

configfile: "config.yaml"

in the top of the file.

ADD REPLY • link 4.6 years ago by Hansen_869 ▴ 80

0

Entering edit mode

configfile will populate what's in the yaml into config. As to whether the code will work, there's only one way to find out.

ADD REPLY • link 4.6 years ago by Eric Lim ★ 2.1k

0

Entering edit mode

So by this approach, the input doesn't access r1 and r2 of the config does it? It uses the key "a", but I specify in my input that it should end with _r1.txt og _r2.txt. Is there any way to not having to write this? Can I just fetch it directly from the config file? I failed to mention, that in my config file, the R1 and R2 key, also have paths like:

R1: FASTQ/SampleA_R1.fq.gz

ADD REPLY • link 4.6 years ago by Hansen_869 ▴ 80

0

Entering edit mode

I generally dislike specifying the entire file path in config, but you can use a function or lambda to read those in. These techniques are all clearly described in the documentation, and I'd seriously recommend you spending some time with it.

Function as input files: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#functions-as-input-files

lambda: https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html (Step 3)

rule merge:
    input: r1 = lambda wildcards: config['samples'][wildcards.sample]['r1'],
             r2 = lambda wildcards: config['samples'][wildcards.sample]['r2']
    output: '{sample}_merge.txt'
    shell: 'cat {input.r1} {input.r2} > {output}'

Unless you start sharing you efforts and code in this learning process, I'm afraid I won't be able to help you further.

ADD REPLY • link 4.6 years ago by Eric Lim ★ 2.1k