Basically, I have three snakemake rules (other than rule all) and cannot figure this problem out, despite the checkpoint resources.
Rule one has my first and only file that I start with. It will have x outputs (the number varies depending on the input file). Each of those x outputs needs to be processed separately in rule 2, meaning that rule 2 will run x jobs. However, only some subset, y, of these jobs will produce outputs (the software only writes out files for inputs that pass a certain threshold). So, while I want each of those outputs to run as a separate job in job 3, I don't know how many files will come out of rule 2. Rule three will also run y jobs, one for each successful output from rule 2. I have two questions. The first is how do I write the input for rule 3, not knowing how many files will come out of rule two? The second question is how can I "tell" rule 2 it is done, when there is not a corresponding number of output files to the input files? If I add a fourth rule, I imagine it would try to re-run rule two on jobs that didn't get an output file, which would never make an output. Maybe I am missing something with setting up the checkpoints?
#this rule has one job rule a: input: file.vcf output: some unkown number of files shell:""" .... make unknown number of output files (x) x_1 , x_2, ..., x_n """ #run a separate job from each output of rule a rule b: input: x_1 #not sure how many are going to be inputs here output: y_1 #not sure how many output files will be here shell:""" some of the x inputs will output their corresponding y, but others will have no output """ #run a separate job in rule c for each output of rule b rule c: input: y_1 #not sure how many input files here output: z_1