Question

How to use multi thread Maker without mpi

0

Entering edit mode

7.7 years ago

Rox ★ 1.4k

Hi everyone !

I tried to use maker, but it was launched on only 1 of my CPU (on 32). I want to use maker "multi threadedly", but I don't see any --threads or -p options that I was used to.

I made some research, and it seems that we should use MPI instead : https://wiki.hpcc.msu.edu/display/Bioinfo/MAKER+Tutorial

I thought mpi was only for distant cluster (it's not my case, I want to launch everything on my own computer) and that it was also a nightmare to use for beginners.

What do you think about it ? Do you have any advices that could be useful for me ?

Thanks !

Roxane

software error genome annotation • 3.1k views

ADD COMMENT • link 7.7 years ago by Rox ★ 1.4k

1

Entering edit mode

I don't know if there are any benefits of using MPI on a single multi-core CPU. You have already referred to the headaches of having to install an MPI environment/compiling Maker to use MPI etc. Ultimately all your cores are sharing the same memory (not sure how much you have).

You may be able to split your input and start multiple independent jobs as a brute force way of parallelizing things. This may require plenty of RAM.

ADD REPLY • link 7.7 years ago by GenoMax 141k

0

Entering edit mode

Hmmm I see, so how I'm I supposed to split my input then ? And most of all, how should I merge all my results files together? Also what does mean "plenty of RAM" for this job ? I think that my super-computer can handle it, but not sure though. So, except mpi, a faster way to run maker doesn't exist ? Because on the maker paper, it is claimed that it's multi-threaded, but it seem that this could be only achieved with mpi... Which is not satisfying for me. I think it would take ages to run maker without using multithreads... And when i need to run it several time in order to train abinitio tools, I will appreciate that the maker step could be faster than 4-5 days (maybe more because sadly, for memory space issues, the process was aborted...)

ADD REPLY • link 7.7 years ago by Rox ★ 1.4k

0

Entering edit mode

Disclaimer: I have not used maker so take the following with a grain of salt. With that out of the way

Your input for maker must be a multi-fasta file of sequences? You could split the file up into chunks and start multiple instances of maker. Based on the example you posted above 5GB per job seems to be required (unless this is a toy example and real jobs need more memory).

I don't know what is the size of your input dataset but this page seems to indicate the following. Using maker via iPlant initiative may be another option.

MAKER is installed and available for iPlant users on the lonestar cluster at the Texas Advanced Computing Center (TACC). Here you can see the entire maize v2 genome (~2 Gb) can be annotated in just over 2 hours using as few as 500 cpus. MAKER-P was also used to annotate the largest genome ever sequenced (loblolly pine, >20 Gb) in less than 15 hours runtime on 8,640 cpus (37 hours total when including queue wait time).

ADD REPLY • link 7.7 years ago by GenoMax 141k