OMA error in reading the all-against-all files
1
0
Entering edit mode
3.1 years ago
nc52 ▴ 10

Hi there,

I am having a problem with OMA not being able to read files. The error message is:

****

Error, '


Its no surprise that it can't read it because I checked the Cache and this part has not been created. I am not sure how to deal with this. I am running OMA standalone on a cluster using the queue manager Sun Grid Engine. The job has stalled at this point (all-against all is not complete). The job was submitted like so:

export NR_PROCESSES=100
qsub -t 1-\$NR_PROCESSES OMA.sh


The error message is reported in the most recently modified output file. I have read on some other posts that similar errors can be fixed by deleting the part that OMA is having trouble with and relaunching the job but considering this one does not exist at all I am unsure what to do. Would be grateful for any advice available. Please let me know if you need more information to understand the problem.

Many thanks, Nicki

OMA • 995 views
0
Entering edit mode
3.1 years ago

Hi Nicki,

If that file has never existed, the process should not have reached this point. There are two possibilities to check: (1) The file is usually compressed, so please check if the file Cache/AllAll/Dimm_proteins_A/Dimm_proteins_A/part_78-177.gz exists. This is what the process actually tries to read. If it exists, it might be corrupted and you could resolve it by removing it and restart.

The (2) option would be that on the cluster files older than a certain threshold get purged automatically. This often happens on scratch filesystems. In that case, you would need to touch the files before they get purged.

Hope this will solve your problem.

0
Entering edit mode

Thanks for your quick reply. I have checked the cache and Cache/AllAll/Dimm_proteins_A/Dimm_proteins_A/part_78-177.gz does not exist.

The files that OMA is generating are stored in my own directory so they should not be purged by any automated process. I am not sure about jobs on the cluster that may be taking too long but I dont think this happens.

Any other thoughts appreciated before I have to start from the beginning!!

Many thanks, Nicki

0
Entering edit mode

Hi Nicki, that is indeed quite weird. Did you try starting OMA again. It should anyways generate the missing part files then, without redoing anything that has previously been computed.

Could you also check if you have any checkpoint files in the AllAll directory: find Cache/AllAll -type f -name "*.ckpt"

0
Entering edit mode

Sorry had to go to the lab.

I issued the find command exactly as above and nothing was displayed in the terminal.... I dont know what that means.

Also, I relaunched OMA from the same directory without changing anything and it doesnt seem to be working very well.

1) The first produced output file has the same error I originally mentioned above. 2) One of the last output files produced shows the following so it looks as though OMA knows the file is not there...

* At least 1 process appears to be still computing the all-vs-all. * The following file(s) is (are) not yet completed: Cache/AllAll/Dimm_proteins_A/Dimm_proteins_A/part_78-177

** If no other process is running, delete these files and restart.

3) I am also seeing this error in a lot of the output files:

4) A single process out of the 100 is still running and seems to be trying to compute some other part of the all vs all comparisons.

I am a bit stumped. Is there a way I can delete some of the comparisons in a sensible order and sort of strip it back to a certain point in the process?

Many thanks for your help, Nicki

1
Entering edit mode

Hi Nicki,

1) it's good that the find command did not returned any result (that means there was no pending checkpoint file). 2) which version of OMA standalone are you using? could it be the same issue than OMA error - download from gene ontology not working 404 ? That should be fixed in OMA standalone 2.3.1 3) that is actually good. the single process is computing the missing part file. once it is done, it will continue with the second stage of OMA standalone (which can not be run in parallel).