I am setting up a Snakemake pipeline for sequencing reads alignment and variants calling. But the samtools index rule is not activated, and the subsequent haplotype caller rule fail.
I think it is because the samtools index rule is not perceived as necessary to execute the output of rule all by Snakemake, but not sure about it.
Has anyone experienced the same issue?
import os
import snakemake.io
import glob
import pandas as pd
###### Config file and sample sheets #####
configfile: "config/config.yaml"
# Load the samples table:
table=pd.read_csv("config/table_reads.tsv", delim_whitespace=True, header=None, index_col=False)
# Use the samples table to make lists of samples names/lists:
SAMPLES=table.loc[:, 0].to_list()
READS=["1","2"]
# The first rule (here rule all) specifies the files that you would like to create during your snakemake workflow.
rule all:
input:
"calls/all.vcf"
#expand("mapped/{sample}.bam", sample=SAMPLES)
# ALIGNMENT:
rule fastqc:
input:
expand("reads/{sample}.{read}.fastq.gz", sample=SAMPLES, read=READS)
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params:
extra = "--quiet"
log:
"logs/fastqc/{sample}.log"
threads: 1
resources:
mem_mb = 1024
wrapper:
"v2.6.0/bio/fastqc"
rule bwa_index:
input:
config["reference"]
#"{genome}.fasta",
output:
idx=multiext("{genome}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
log:
"logs/bwa_index/{genome}.log",
params:
algorithm="bwtsw",
wrapper:
"v2.6.0/bio/bwa/index"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/bwa/mem.html
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.bam",
log:
"logs/bwa_mem/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sorting="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.9.1/bio/bwa/mem"
# CALLING VARIANTS
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/samtools/sort.html
rule samtools_sort:
input:
"mapped/{sample}.bam",
output:
"mapped/{sample}.sorted.bam",
log:
"{sample}.log",
params:
extra="-m 4G",
threads: 8
wrapper:
"v2.10.0/bio/samtools/sort"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/samtools/
rule samtools_index:
input:
"mapped/{sample}.sorted.bam",
output:
"mapped/{sample}.sorted.bam.bai",
log:
"logs/samtools_index/{sample}.log",
params:
extra="", # optional params string
threads: 4 # This value - 1 will be sent to -@
wrapper:
"v2.10.0/bio/samtools/index"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/gatk/haplotypecaller.html
rule haplotype_caller_gvcf:
input:
# single or list of bam files
bam="mapped/{sample}.sorted.bam",
ref=config["reference"],
# known="dbsnp.vcf" # optional
output:
gvcf="calls/{sample}.g.vcf",
# bam="{sample}.assemb_haplo.bam",
log:
"logs/gatk/haplotypecaller/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
threads: 4
resources:
mem_mb=1024,
wrapper:
"v2.6.0/bio/gatk/haplotypecaller"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/gatk/genomicsdbimport.html
rule genomics_db_import:
input:
gvcfs=expand("calls/{sample}.g.vcf", sample=SAMPLES),
output:
db=directory("db"),
log:
"logs/gatk/genomicsdbimport.log",
params:
intervals=config["reference"],
db_action="create", # optional
extra="", # optional
java_opts="", # optional
threads: 2
resources:
mem_mb=lambda wildcards, input: max([input.size_mb * 1.6, 200]),
wrapper:
"v2.6.0/bio/gatk/genomicsdbimport"
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/gatk/genotypegvcfs.html
rule genotype_gvcfs:
input:
gvcf="db", # combined gvcf over multiple samples
# N.B. gvcf or genomicsdb must be specified
# in the latter case, this is a GenomicsDB data store
ref=config["reference"],
output:
vcf="calls/all.vcf",
log:
"logs/gatk/genotypegvcfs.log"
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024
wrapper:
"v2.6.0/bio/gatk/genotypegvcfs"
It seems that your first solution works well, and does not break the wrapper. I have added
idx="mapped/{sample}.sorted.bam.bai"to the inputs of the haplotype caller, and it works well. Thank you for the information!