Variant Calling pipeline to be run parallel on multiple cores
0
0
Entering edit mode
10 months ago
AR • 0

Hi all,

I have a variant calling pipeline containing multiple steps from different tools. Currently, I am working on a Cray system. I have already run the commands individually on single sample. Now, I want to go for multiple samples (24 samples at a time). I want to run my complete pipeline of variant calling on the high-end server using the MPI module. My commands are in the python script and I want to modify it for mpi4py. Just an ex:

When run individually:

import os
os.system("command 1")


But if running all together for multiple commands on multiple cores

from mpi4py import MPI
import os

Sample = ["1","2","3"]

for a in Sample:
os.system("command1..input="+a", output="+a+"_1")
os.system("command2..input="+a+"_1, output="+a+"_2")
os.system("command3..input="+a+"_2, output="+a+"_3")
os.system("command4..input="+a+"_3, output="+a+"_4")
os.system("command5..input="+a+"_4, output="+a+"_5")

comm = MPI.COMM_WORLD
rank = comm.Get_rank()


This script is not working at all.

Can anyone pls help me. I just want to run my python script with import os on multiple processors at a time. (20 samples on 20 cores) And I have to use only MPI module.

Thank you

sequencing • 516 views
5
Entering edit mode

My commands are in the python script

please don't. Use a workflow manager like nextflow or snakemake.

0
Entering edit mode

Agree with what Pierre Lindenbaum said

1
Entering edit mode

psuedo code

import subprocess

cmd_file_name = cmd_file.txt
cmd_file = open(cmd_file_name, "a")

jobs = 20

for i in your_sample_list :
cmd = " ".join( [  "your command", "-i" , i ] )
cmd_file.write(cmd, "\n")

cmd_file.close()

parallel_cmd = " ".join( [ "parallel", "--eta", "-j", jobs, "<", cmd_file_name ] )

subprocess.run(parallel_cmd, shell = True)