Entering edit mode
7 months ago
Begonia_pavonina
▴
150
I am trying to make a Snakemake pipeline using a configuration file config.yaml.
---
samples:
- Sample_1: reads/raw_reads_1
- Sample_2: reads/raw_reads_2
- Sample_3: reads/raw_reads_3
reference:
- "path/to/reference.fasta"
...
Following the good practices given in the Snakemake wrapper repository, I have used wrappers commands as much as possible. https://snakemake-wrappers.readthedocs.io/en/stable/ However I have observed that the input/outputs of the different wrappers do not match. Is there a way to manage the names of the files between the rules to connect the wrappers without modifying them?
import os
import snakemake.io
import glob
configfile: "config.yaml"
rule all:
input:
expand("mapped/{sample}.bam", sample=config["samples"])
rule fastqc:
input:
"reads/{sample}.fastq"
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params:
extra = "--quiet"
log:
"logs/fastqc/{sample}.log"
threads: 1
resources:
mem_mb = 1024
wrapper:
"v2.6.0/bio/fastqc"
rule bwa_index:
input:
"{genome}.fasta",
output:
idx=multiext("{genome}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
log:
"logs/bwa_index/{genome}.log",
params:
algorithm="bwtsw",
wrapper:
"v2.6.0/bio/bwa/index"
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.bam",
log:
"logs/bwa_mem/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sorting="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.6.0/bio/bwa/mem"
Can you give us an example please?