Question

How to set-up mutiple SPAdes runs for various genomes

1

Entering edit mode

9.5 years ago

queenofcold ▴ 10

I am in the process of cobbling together a NGS pipeline for trimming, assembling, and annotating genomes for our group and I also have very limited bioinformatics experience. I would like to be able to run several SPAdes jobs at once. Thankfully, I have access to a cluster that uses SGE. However, I have been unable to set-up an array job with SPAdes. The SPAdes command line interface keeps trying to read the index variable needed for SGE to run an array as part of the file name, which it then can't find. I tried concatenating my files and having one entry per line, but that also has failed. Any suggestions so as to avoid writing the same thing over and over again on the command line with only slight changes in the input file name?

Thanks!

SPAdes Assembly Array • 3.4k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by queenofcold ▴ 10

Ram · Answer 1 · 2014-11-14

1

Entering edit mode

9.5 years ago

Ram 43k

This should not happen - the underlying program (SPAdes in your case) should not see the array variable at all. Are you sure you're running the job as an array job and not as a single job? Sometimes batch processing systems have different options for array variables and if you run an array job as a single job, they may end up passing array jobs to the underlying program.

For example, when I worked on PBS, the array variable $PBS_ARRAYID could only be used in conjunction with the -t option provided at job submission time.

ADD COMMENT • link 2.2 years ago by Ram 43k

0

Entering edit mode

Actually, I don't know for sure that I am NOT doing that. I am following a template for arrays suggested by our sys admin. I have tried 2 different approaches.

1.

INPUT_FILE=[cat  /yaml_files | sed -n "${SGE_TASK_ID}p"]
spades.py -k 21,31,55,77 --careful --dataset $INPUT_FILE -o /SPAdes/"$INDEX"

2.

SAMPLE_DIR= /yaml_files
SAMPLE_LIST=($SAMPLE_DIR/*.yaml)
INDEX=$((SGE_TASK_ID-1))
INPUT_FILE=${SAMPLE_LIST[$INDEX]}

spades.py -k 21,31,55,77 --careful --dataset $INPUT_FILE -o /SPAdes/"$INDEX"

ADD REPLY • link updated 5.4 years ago by Ram 43k • written 9.4 years ago by queenofcold ▴ 10

0

Entering edit mode

Shouldn't you need:

INPUT_FILE=[ `cat  /yaml_files | sed -n "${SGE_TASK_ID}p"` ]

Also, this is the code in the script. I am more concerned where ${SGE_TASK_ID} gets its value from. What is the command you use to submit the job?

For example, your command should look something like:

qsub my-array-job.sh -t 1-15

and then you can access the numbers 1 thru 15 using the array variable $SGE_TASK_ID

ADD REPLY • link 5.4 years ago by Ram 43k