mapping script process sleeping on server.
0
0
Entering edit mode
8.0 years ago
lvogel ▴ 30

Hi, sorry if this is too off-topic, but I need to take the chance just in case I can get some advice.

I wrote a script to map our RNA-Seq reads to a transcriptome and submitted it to the server. This was on Friday afternoon, and it is now Sunday. It hasn't finished. When I log on to the server to check on the job, first it was running, but it has been sleeping since yesterday or so.

I was wondering if there's any way I can know if there is something wrong with my script, and if I should amend it and try again. I am an undergraduate student, and I have already experienced delays in this project. I don't want to disappoint my supervisor by causing anymore setbacks by this not mapping correctly.

alignment • 2.3k views
0
Entering edit mode

The aligner will generate a log file about the status. For e.g for tophat, it's run.log in the logs folder. You can see the status there. which aligner are you using?

0
Entering edit mode

What is this server? Is it a stand-alone machine or a cluster? And what program are you running? Does your script run multiple programs in succession? Look at the output files produced already, find which one was modified last and that will tell you where it is stuck.

Extract your script until the step next to the step on which it is stuck and find out where it could be going wrong.

0
Entering edit mode

Give us some more information. Are you using a cluster for your job? SGE/PBS output job_name.ejobid & job_name.ojobid file. You can check inside those files if there is anything wrong.

0
Entering edit mode

Thanks. I'm using NextGenMap. It puts out job_name.o and job_name.e as output and error files. That's what I should check, right?
Cluster . . . I think no. It's just one server that I connect to.

1
Entering edit mode

Just read through those files. Perhaps towards the end of the file. There can be a log that may give a clue to the problem.

0
Entering edit mode

Are you sure it is not a cluster? "I think" is not a confident response. Also, the kind of STDOUT and STDERR template naming suggests you might be working with PBS/SGE/LSF cluster systems.

How do you run the command - do you submit a script with directives in it? How do you ensure the job doesn't get killed when you log out of the command line? Do you use screen or nohup <command> & or submit it using bsub or qsub?

0
Entering edit mode

Thank you to everyone for your input.

Ram, I am almost certain I am only sending jobs to a single server, and not the cluster, but I could be mistaken.

Yes, I submit a shell script with directives in it. I submitted it using qsub, so therefore I shouldn't have to worry about it getting killed when I log out--is this correct?

I checked the files, and the .e file is a 510 MB file, ending in 5 million lines like this:

[Progress] Mapped: 33922654, CMR/R: 189, CS: 11 (33), R/S: 523, Time: 1.59 32.15
[Progress] Mapped: 33922654, CMR/R: 189, CS: 11 (33), R/S: 523, Time: 1.59 32.15
[Progress] Mapped: 33922654, CMR/R: 189, CS: 11 (33), R/S: 523, Time: 1.59 32.15 0.01


I don't see anything in it that looks like an error. Do you think it might have just gone to sleep after a certain time?

0
Entering edit mode

Type qstat and paste the output here. Also, paste the directives that you used in your script file.

0
Entering edit mode

Output from qstat:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
34046 0.27219 mapping.sh   janev        r     03/05/2015 15:09:55 mainqueue@turtle-1.grid.bzm        1


The directives I used are as follows:

#!/bin/csh
./ngm -q NG-7384_J2_fs_2_lib42915_2542_3_1.fastq -r /home/l....tr -i 0.8 --kmer-skip 0 -s 0.0 -o J2.sam
./ngm -q NG-7384_J2_fs_3_lib42916_2577_7_1.fastq -r /home/l....tr -i 0.8 --kmer-skip 0 -s 0.0 -o J3.sam


and six more lines like that.

Thanks again!

2
Entering edit mode

Well you haven't really utilized the power of resource management systems as your script is missing a proper header. Nevertheless, your job is still running. The "qstat" output shows that the state of your job is "r" that represents it is running. Can you see any output files? The sam files will be produced in order. You should see some output for J2.sam file as it has almost been 2 days.

0
Entering edit mode

You are using qsub, so most probably a PBS/torque system. From the server name, looks like you're on a cluster. Like Ashutosh said, give us your exact qsub command (if you used the command like to pass directives) or include the directives from the PBS script.

0
Entering edit mode

Good idea, Ashutosh. :)

I am redoing a mapping job that was done incorrectly the first time, so the output files have been there--this time around I need to know if they're being overwitten with new ones.

So, I said ls -ltc *.sam , and sure enough it last modified the last file one minute ago.

I feel both glad and slightly foolish for even starting a thread about this. It was just that when I typed top and saw an S by my job, I thought it wasn't getting anywhere anymore, but I guess that doesn't necessarily mean that?

Next to learn a proper header.

(I meant this post to be a comment, not an answer.)

0
Entering edit mode

I wrote a script to automatically generate the PBS scripts (tailor to my server environment), if you want, you can always use it and change it accordingly. Hope this will help

0
Entering edit mode

Wow that's a long script! My approach to this was to find out optimal configuration that gets scheduled fast and is performant for most requirements and use variables specific to the scheduling system in naming files (such as Job ID, job name for naming log files). This way, I'd just have to edit the Job Name each time and it'd run great.

But that's because I'm lazy. Wow again, that script looks so complicated!

2
Entering edit mode

Most part of that script was for conditional check. If you want, you can always use this header:

#!/bin/bash
#PBS -N <Name of job>
#PBS -l mem=<Memory>
#PBS -l walltime=<Wall time>
#PBS -q default
#PBS -V
#PBS -l nodes=1:ppn=12 #Number of nodes and threads, my server didn't have the cluster function, so node is always 1
#PBS -V  #Such that the environment variable is always loaded
cd $PBS_O_WORKDIR  ADD REPLY 0 Entering edit mode That was pretty much my default template, with some minor changes to nodes, mem and walltime :) ADD REPLY 0 Entering edit mode Yes, that does look very complicated! Maybe I'll try it . . . .right now I just have a relatively quick question. My exact qsub command is qsub -cwd foo.sh  I noticed it didn't run without the -cwd, so I added it. Coud this be slowing things down or anything? ADD REPLY 1 Entering edit mode According to the manual: -cwd Available for qsub, qsh, qrsh and qalter only. Execute the job from the current working directory. This switch will activate Sun Grid Engine's path alias- ing facility, if the corresponding configuration files are present (see sge_aliases(5)). In the case of qalter, the previous definition of the current working directory will be overwritten if qalter is executed from a different directory than the preced- ing qsub or qalter. So basically, you are running your job without specified how many nodes or threads you are giving but only telling the scripts that they should be in the current directory (kinda like the cd$PBS_O_WORKDIR


in my header file. You might want to add for example -l nodes=1:ppn=12 to enable 12 threads

1
Entering edit mode

Oh, and I looked at your script, you haven't even enabled the multi-thread alignment parameter of ngm (-t). You might want to set that to enable multi-threading. It should help you to speed things up.