snakemake : associating 2 wildcards for a specific mapping
1
0
Entering edit mode
3.7 years ago

Hi there,

After reading this snakemake documentation and processing several tests, I'm unable to solve my problem. I'm working on influenza. I have an association table between sample and their matching subtype, like that:

SAMPLE  SUBTYPE
TH3pos20191217_S96  H3N2_Perth16
S1967_S46   pH1N1_California07
S1946_S32   pH1N1_California07
D1914_S14   H3N2_Perth16
Tneg20191217_S95    UNMAPPED

I'm trying to build a snakemake rule for a specific mapping with the corresponding reference. I thought of using a dictionary to associate my two wildcards, but I can't get anything functional. I think the problem lies in the definition of my wildcards. Do you have any suggestions?

Here is my current script:

import os
import pandas as pd

configfile:"config.yaml"

#Get information from config file
result_repository=config['Result_Repository']

#Select sample and assign subtype
sum_premapping = result_repository + "REPORT/SUBTYPING/Subtyping_result.csv"
table=pd.read_csv(sum_premapping,sep=";") 
table = table.loc[table['SUBTYPE'] != "UNMAPPED"]

sample_list=list(table['SAMPLE'])
subtype_list=list(table['SUBTYPE'])

list_samplesub={}
for i in range(0,len(sample_list)):
    list_samplesub[sample_list[i]] = subtype_list[i]

(SAMPLE)=sample_list

subtype=table.loc[table['SAMPLE'] == sample].value[0]

rule all:
    input:
        test= expand(result_repository + "MY_BAMs/{subtype}/{sample}.bam",subtype=list_samplesub[wildcards.sample],sample=SAMPLE)

rule test:
    input:
        viral_R1_gz = result_repository + 'DEHOSTING/{sample}_viral_R1.fastq.gz',
        viral_R2_gz = result_repository + 'DEHOSTING/{sample}_viral_R2.fastq.gz',
        FLU_subtype = "references/influenza/{subtype}.fasta"

    output:
        subtype_bam = result_repository + "MY_BAMs/{subtype}/{sample}.bam"
    shell:
        "minimap2 -ax  sr {input.FLU_subtype} {input.viral_R1_gz} {input.viral_R2_gz} | samtools view -bS > {output.subtype_bam}"

Thank you all and stay safe,

Hadrien

snakemake wildcards • 933 views
ADD COMMENT
2
Entering edit mode
3.7 years ago

After few days of trying and searching, I found something:

#Output bams after specifics reference mapping
subtype_bam=[]
for i in range(0,len(sample_list)):
    subtype_bam.append(result_repository + "FULL_SUBTYPED_BAM/"+subtype_list[i]+"/"+sample_list[i]+".bam")

then:

rule all:
    input:
        annoted_bam,
rule complete_subtype_mapping:
    input:
        viral_R1_gz = result_repository + 'DEHOSTING/{sample}_viral_R1.fastq.gz',
        viral_R2_gz = result_repository + 'DEHOSTING/{sample}_viral_R2.fastq.gz',
        FLU_subtype = "references/influenza/{subtype}.fasta"
    conda:
        "envs/minimap2.yaml"
    output:
        subtype_bam = result_repository + "FULL_SUBTYPED_BAM/{subtype}/{sample}.bam"
    shell:
        "minimap2 -ax  sr {input.FLU_subtype} {input.viral_R1_gz} {input.viral_R2_gz} | samtools sort  > {output.subtype_bam}"

I cant find the biostars link guiding me for this code, I'll edit later.

EDIT: this was on StackOverflow:solution

ADD COMMENT

Login before adding your answer.

Traffic: 2859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6