Question

Submitting RNA-seq to Supercomputer

0

Entering edit mode

5.4 years ago

matthew.sinton • 0

Hi,

I'm totally new to running scripts, and I'm struggling to get going with submitting a job to our university supercomputer.

Basically, I want to index a Fasta reference genome before I perform my mapping, using BWA to do this.

The script provided by the university, to submit jobs to the supercomputer, is not that clear to me, and I was hoping that someone might be able to help. The documentation gives the following, which I can follow:

#!/bin/sh
# Grid Engine options (lines prefixed with #$)
#$ -N hello              
#$ -cwd                  
#$ -l h_rt=00:05:00 
#$ -l h_vmem=1G
#  These options are:
#  job name: -N
#  use the current working directory: -cwd
#  runtime limit of 5 minutes: -l h_rt
#  memory limit of 1 Gbyte: -l h_vmem

However, I have zero idea of where to insert the following script:

bwa index -a bwtsw Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

Also, this is probably really daft, but I also have no idea of how to reference the location of the file, so that the script knows where it is...

Sorry if this is all very basic stuff, and thanks for any help!

Matthew

RNA-Seq fasta next-gen Assembly • 1.3k views

ADD COMMENT • link updated 5.4 years ago by Biostar 20 • written 5.4 years ago by matthew.sinton • 0

1

Entering edit mode

You would add bwa command line at the end of the script above (keeping in mind $PATH considerations). Initial part with # is setting up parameters for your job scheduling system. You would need to provide correct parameters. This is just a skeleton example.

I recommend that you follow basic UNIX tutorial here. Investing sometime in it will be forever useful.

ADD REPLY • link 5.4 years ago by GenoMax 141k

0

Entering edit mode

Thanks for the quick reply! So when I add the bwa command, when I put in the file name, do I just have to ensure that I include the full path so that it knows where the file is?

ADD REPLY • link 5.4 years ago by matthew.sinton • 0

0

Entering edit mode

You could include full path for now as you learn about unix file system and relative path concepts.

ADD REPLY • link 5.4 years ago by GenoMax 141k

0

Entering edit mode

Thanks so much. You've been really helpful!

ADD REPLY • link 5.4 years ago by matthew.sinton • 0

1

Entering edit mode

A runtime limit of 5 minutes is very short.

ADD REPLY • link 5.4 years ago by Michael 54k

score 3 · Answer 1 · 2018-11-20

As @genomax said, this is just an example. The example script given by your University give you access to 1Go of RAM for a 5 minutes running time.

You need first to upload your data to the server, something like this :

scp /home/matthew/Documents/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz metthew_login@University_adress:/home/matthew_login/work/genome/

Then, log in on the server and create a bash script ( index_human.sh ) where you'll write something like this

#!/bin/sh
# Grid Engine options (lines prefixed with #$)
#$ -N create_human_index_GRCh38     
#$ -cwd                  
#$ -l h_vmem=30G
#  These options are:
#  job name: -N
#  use the current working directory: -cwd
#  memory limit of 1 Gbyte: -l h_vmem

bwa index -a bwtsw /home/matthew_login/work/genome/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

Adapt the location of Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

Seems like your cluster is SGE managed

Launch your bash script :

qsub index_human.sh

Check the ouput of your script in your current directory to see how is goes