Question: Variant Calling pipeline to be run parallel on multiple cores
Hi all,

I have a variant calling pipeline containing multiple steps from different tools. Currently, I am working on a Cray system. I have already run the commands individually on single sample. Now, I want to go for multiple samples (24 samples at a time). I want to run my complete pipeline of variant calling on the high-end server using the MPI module. My commands are in the python script and I want to modify it for mpi4py. Just an ex:

When run individually:

import os
os.system("command 1")

But if running all together for multiple commands on multiple cores

from mpi4py import MPI
import os

Sample = ["1","2","3"]

for a in Sample:
    os.system("command1..input="+a", output="+a+"_1") 
    os.system("command2..input="+a+"_1, output="+a+"_2")
    os.system("command3..input="+a+"_2, output="+a+"_3")
    os.system("command4..input="+a+"_3, output="+a+"_4")
    os.system("command5..input="+a+"_4, output="+a+"_5")

rank = comm.Get_rank()

This script is not working at all.

Can anyone pls help me. I just want to run my python script with import os on multiple processors at a time. (20 samples on 20 cores) And I have to use only MPI module.

Thank you

sequencing • 133 views
My commands are in the python script

please don't. Use a workflow manager like nextflow or snakemake.

Agree with what Pierre Lindenbaum said

Using GNU parallel instead

psuedo code

import subprocess

cmd_file_name = cmd_file.txt     
cmd_file = open(cmd_file_name, "a")

jobs = 20 

for i in your_sample_list :
  cmd = " ".join( [  "your command", "-i" , i ] )
  cmd_file.write(cmd, "\n")


parallel_cmd = " ".join( [ "parallel", "--eta", "-j", jobs, "<", cmd_file_name ] ), shell = True)
