Question: How to set-up mutiple SPAdes runs for various genomes
1
gravatar for queenofcold
5.0 years ago by
queenofcold10
United States
queenofcold10 wrote:

I am in the process of cobbling together a NGS pipeline for trimming, assembling, and annotating genomes for our group and I also have very limited bioinformatics experience. I would like to be able to run several SPAdes jobs at once. Thankfully, I have access to a cluster that uses SGE. However, I have been unable to set-up an array job with SPAdes. The SPAdes command line interface keeps trying to read the index variable needed for SGE to run an array as part of the file name, which it then can't find. I tried concatenating my files and having one entry per line, but that also has failed. Any suggestions so as to avoid writing the same thing over and over again on the command line with only slight changes in the input file name?

Thanks!

spades assembly array • 2.4k views
ADD COMMENTlink modified 5.0 years ago by RamRS24k • written 5.0 years ago by queenofcold10
1
gravatar for RamRS
5.0 years ago by
RamRS24k
Houston, TX
RamRS24k wrote:

This should not happen - the underlying program (SPAdes in your case) should not see the array variable at all. Are you sure you're running the job as an array job and not as a single job? Sometimes batch processing systems have different options for array variables and if you run an array job as a single job, they may end up passing array jobs to the underlying program.

For example, when I worked on PBS, the array variable $PBS_ARRAYID could only be used in conjunction with the -t option provided at job submission time.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by RamRS24k

Actually, I don't know for sure that I am NOT doing that. I am following a template for arrays suggested by our sys admin. I have tried 2 different approaches.

1.

INPUT_FILE=[cat  /yaml_files | sed -n "${SGE_TASK_ID}p"]
spades.py -k 21,31,55,77 --careful --dataset $INPUT_FILE -o /SPAdes/"$INDEX"

2.

SAMPLE_DIR= /yaml_files
SAMPLE_LIST=($SAMPLE_DIR/*.yaml)
INDEX=$((SGE_TASK_ID-1))
INPUT_FILE=${SAMPLE_LIST[$INDEX]}

spades.py -k 21,31,55,77 --careful --dataset $INPUT_FILE -o /SPAdes/"$INDEX"
ADD REPLYlink modified 12 months ago by RamRS24k • written 5.0 years ago by queenofcold10

Shouldn't you need:

INPUT_FILE=[ `cat  /yaml_files | sed -n "${SGE_TASK_ID}p"` ]

Also, this is the code in the script. I am more concerned where ${SGE_TASK_ID} gets its value from. What is the command you use to submit the job?

For example, your command should look something like:

qsub my-array-job.sh -t 1-15

and then you can access the numbers 1 thru 15 using the array variable $SGE_TASK_ID

ADD REPLYlink modified 12 months ago • written 5.0 years ago by RamRS24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1777 users visited in the last hour