Question: snakemake wildcard for fastq files
0
gravatar for arshil
7 months ago by
arshil0
arshil0 wrote:

Hi everyone, can anyone help me out setting up the wild card for list of paired end fastq files.(SRR7058331_1.fastq.gz, SRR7058331_2.fastq.gz I am trying to access the files from config.yaml file which looks like

sourcedir: /t6/h7/data/expression
refdir: /AA/Reference_genomes
datadirs:
  fastq: $sourcedir/demo_data
  bam: $sourcedir/bam
  quant: $sourcedir/quant

The code which I am is.
import yaml
configfile: "config.yaml
SAMPLES,=glob_wildcards(config['sourcedir'] + config['datadirs']['fastq']  + "/" +  "{sample}_R1.fastq.gz"))
READS=["1","2"]

its not working. I am pretty new to this.

rna-seq snakemake config.yaml • 656 views
ADD COMMENTlink modified 7 months ago by bari.ballew190 • written 7 months ago by arshil0
2

you need to follow up on your older questions first. you keep posting variations of the same problem without resolving earlier issues.

ADD REPLYlink written 7 months ago by Jeremy Leipzig18k
import yaml 
configfile: "config.yaml 
SAMPLES,=glob_wildcards(config['sourcedir'] + config['datadirs']['fastq'] + "/" + "{sample}_R1.fastq.gz")
READS=["1","2"]
ADD REPLYlink modified 7 months ago by genomax76k • written 7 months ago by arshil0

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Always edit the original post if you are adding useful information.

Thank you!

ADD REPLYlink written 7 months ago by genomax76k
2
gravatar for bari.ballew
7 months ago by
bari.ballew190
USA/NIH
bari.ballew190 wrote:

It looks like you may have duplicated part of your path. Right now, your path to your data reads: /t6/h7/data/expression/t6/h7/data/expression/demo_data/{sample}_R1.fastq.gz

I'm assuming you need to access the paired fastq files in tandem for alignment or something similar. Try something like this:

import glob
import os

configfile: "config.yaml"
fastqDir = config['datadirs']['fastq'] + '/'

SAMPLES = glob.glob(fastqDir + '*_R1.fastq.gz')  # read in file list
SAMPLES = [os.path.basename(x) for x in SAMPLES]  # remove path from filenames
SAMPLES = [x.replace('_R1.fastq.gz','') for x in SAMPLES]  # isolate sample ID from filename

def get_r1(wildcards):
    return glob.glob(fastqDir + wildcards.sample + '_R1.fastq.gz')

def get_r2(wildcards):
    return glob.glob(fastqDir + wildcards.sample + '_R2.fastq.gz')

rule do_something:
    input: 
        r1 = get_r1,
        r2 = get_r2
...
ADD COMMENTlink written 7 months ago by bari.ballew190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1329 users visited in the last hour