Question

snakemake wildcard for fastq files

0

Entering edit mode

6.1 years ago

arshil • 0

Hi everyone, can anyone help me out setting up the wild card for list of paired end fastq files.(SRR7058331_1.fastq.gz, SRR7058331_2.fastq.gz I am trying to access the files from config.yaml file which looks like

sourcedir: /t6/h7/data/expression
refdir: /AA/Reference_genomes
datadirs:
  fastq: $sourcedir/demo_data
  bam: $sourcedir/bam
  quant: $sourcedir/quant

The code which I am is.
import yaml
configfile: "config.yaml
SAMPLES,=glob_wildcards(config['sourcedir'] + config['datadirs']['fastq']  + "/" +  "{sample}_R1.fastq.gz"))
READS=["1","2"]

its not working. I am pretty new to this.

RNA-Seq snakemake config.yaml • 4.2k views

ADD COMMENT • link updated 6.1 years ago by bari.ballew ▴ 480 • written 6.1 years ago by arshil • 0

2

Entering edit mode

you need to follow up on your older questions first. you keep posting variations of the same problem without resolving earlier issues.

ADD REPLY • link 6.1 years ago by Jeremy Leipzig 23k

0

Entering edit mode

import yaml 
configfile: "config.yaml 
SAMPLES,=glob_wildcards(config['sourcedir'] + config['datadirs']['fastq'] + "/" + "{sample}_R1.fastq.gz")
READS=["1","2"]

ADD REPLY • link updated 6.1 years ago by GenoMax 152k • written 6.1 years ago by arshil • 0

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Always edit the original post if you are adding useful information.

Thank you!

ADD REPLY • link 6.1 years ago by GenoMax 152k

score 2 · Answer 1 · 2019-06-07

It looks like you may have duplicated part of your path. Right now, your path to your data reads: /t6/h7/data/expression/t6/h7/data/expression/demo_data/{sample}_R1.fastq.gz

I'm assuming you need to access the paired fastq files in tandem for alignment or something similar. Try something like this:

import glob
import os

configfile: "config.yaml"
fastqDir = config['datadirs']['fastq'] + '/'

SAMPLES = glob.glob(fastqDir + '*_R1.fastq.gz')  # read in file list
SAMPLES = [os.path.basename(x) for x in SAMPLES]  # remove path from filenames
SAMPLES = [x.replace('_R1.fastq.gz','') for x in SAMPLES]  # isolate sample ID from filename

def get_r1(wildcards):
    return glob.glob(fastqDir + wildcards.sample + '_R1.fastq.gz')

def get_r2(wildcards):
    return glob.glob(fastqDir + wildcards.sample + '_R2.fastq.gz')

rule do_something:
    input: 
        r1 = get_r1,
        r2 = get_r2
...