Deleted:Snakemake input/output wildcards from Pandas dataframe
1
1
Entering edit mode
3 months ago
Shred ▴ 620

Hi,

Sorry if this question has already answers outside, but I'm struggling to find a solution. I've got a csv file with information about experiment, with a scheme like:

Sample,condition,patient
132,tumor,A
133,control,A
134,tumor,B
....


Where each patient has got two file, a tumor and a control one: a fairly common scenario. In snakemake I've got a rule which needs to have as input both file ({sample}.bam), and needs to use patient to define output filename, like:

rule Mutation:
input:
tumor =
control =
output:
vcf = "{patient}.vcf"


I've managed to import the csv in Pandas to handle experiment information as a dataframe. I've tried a solution for tumor and control input by writing a python function that returns a list, where each element is the concatenation between path+sample+extension, and I've passed it via rule all. This solution doesn't solve the problem, because I need also the patient information to be used as an output parameter, as the example. Another solution may be to edit filename before passing them to the rule, with a schema like {sample}_{condition}_{patient}.bam, in order to use the same wildcards across input/output..

Is there a better solution?

Workflow Snakemake Python • 402 views