Greetings,
I'm trying to build a pipeline for NGS.
I made a small example pipeline for passing commands to shell. Example pipeline has two scripts thats called from shell that just concatenatessumtool.py) and multipliesmultool.py) values in many dataframes (10 in this case). My wrapperwrapper.py) handles with the input and passes the commands that runs the scripts in order. Here is the relevant part of the code from the wrapper:
def run_cmd(orig_func):
@wraps(orig_func)
def wrapper(*args,**kwargs):
cmdls = orig_func(*args,**kwargs)
cmdc = ' '.join(str(arg) for arg in cmdls)
cmd = cmdc.replace(',','')
Popen(cmd,shell=True).wait()
return wrapper
@run_cmd
def runsumtool(*args):
return args
for file in getcsv():
runsumtool('python3','sumtool.py','--infile={}'.format(file),'--outfile={}'.format(dirlist[1]))
This works alright but I want to be able to pass all the commands at once for the first script with all the dataframes wait for it to finish and then run the second script with all commands at once for every dataframe. Since Popen().wait() waits for each command it takes way longer.
I tried to incorporate luigi for a solution but I wasn't successful running external programs or trying to pass multiple I/O's with luigi. Any tip on that is appreciated.
Another solution I'm imagining is passing the samples individually all at once but I'm not sure how to put it in python(or any other language really). This would also solve the I/O problem with luigi.
thanks
Note1: This is a small example pipeline I build. My main purpose is to call programs like bwa, picard in a pipeline ... which i cannot import.
Note2: I'm using Popen from subprocess already. You can find it between lines 4 and 5.
This looks to me like an advanced python programming question better asked on SO. I am also not sure I quite understand the question but it looks like you could use a lightweight workflow management system like one of these.
I asked this on SO, no answers:/ . Thanks for the link it is helpful .
I run external programs in Python, using subprocess, without Popen. If you're going to use bwa and picard, I imagine you have fastq files, and if they're paired ends, you can use glob, to collected them into a tuple form a directory and then process them.