Hello,
It's me again for a new snakemake question! This time, I want to read a table in input using a R script, to generate a list of reads to keep. But this list may came from multiple source, (genotype in this case).
I generate with my R script an output file {barecode}_list_{genotype}.txt
.
barecode
wildcard is already defined, and genotype
can be either A/B/C/...
I tried something like this:
rule R_HBV_analysis:
input:
R_data = datapath+"BLASTN/{barcode}_fmt.txt"
output:
Rresult = datapath+"R_RESULT/{barcode}_list_{genotype}.txt",
wildcard_constraints:
genotype =["GTA","GTB","GTC","GTD","GTE","GTF","GTG","GTH","GTI","GTJ"]
params:
path = datapath
shell:
"""
if [ ! -d {params.path}RDATA ];then
mkdir {params.path}RDATA
fi
if [ ! -d {params.path}R_RESULT ];then
mkdir {params.path}R_RESULT
fi
Rscript script/HBV_analysis.R {input} {params.path}
"""
with
(GENOTYPE)={}
GENOTYPE['genotype'] =("GTA","GTB","GTC","GTD","GTE","GTF","GTG","GTH","GTI","GTJ")
rule all:
input:
Rresults = expand(datapath+"R_RESULT/{barcode}_list_{genotype}.txt",barcode=BARCODE,genotype=GENOTYPE['genotype'])
But it return me an error, because snakemake expect a file for all genotypes, wich is not the case.
I hope I was clear in my question. thanks,
Hadrien
Sorry, how does your R-script know which of the genotypes are to be present in the output file?
In fact, I'm counting the most represented genotype in my dataset. So, depending on the dataset, it could be A, or B etc .....
Then why not have a single output file name used by each run of your Rscript, and encode the mode-genotype within the file, rather than within the filename?
Well, you totally rigth! I didn't think about it at all. It should be more easier to handle this way with snakemake. Thanks for this advice and all of your tips on the second answer.