Question

aligning multiple files to the same genome using star

0

Entering edit mode

4.8 years ago

m.kamal • 0

Hi all,

I am trying to run a slurm script to align multiple single reads files using star. I am using the script below, however I get this error message

"Jul 21 00:29:37 ..... Started STAR run

EXITING: fatal error from shmget() trying to allocate shared memory piece: error typeInvalid argument
Possible cause 1: not enough RAM. Check if you have enough RAM of at least31652814154 bytes
Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31652814154"

the script I am using is:

!/bin/bash
#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=48GB
#SBATCH --time=10:30:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=redacted


STAR --genomeLoad LoadAndExit --genomeDir ~/proj/sources/star/genome/hg38_star

for k in *fq

do

filename=$(basename $k .fq)

STAR --runThreadN 16 --outFilterScoreMinOverLread 0 --genomeDir ~/proj/sources/star/genome/hg38_star --outFilterMatchNmin 30 --outFilterMismatchNmax 100 outReadsUnmapped Fastx --outFilterMatchNminOverLread 0.6 --readFilesIn ~/proj/rarecyte/trimgalore/$k  --outSAMtype BAM SortedByCoordinate

done

Your help is very much appreciated.

Thanks.

Mohamed

RNA-Seq • 3.4k views

ADD COMMENT • link updated 4.7 years ago by Biostar 20 • written 4.8 years ago by m.kamal • 0

0

Entering edit mode

Did you try following the solution proposed in the error message? (ulimit -v 31652814154)

ADD REPLY • link 4.8 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Yes indeed I did and I also set the limit to unlimited and still getting the same message

ADD REPLY • link 4.8 years ago by m.kamal • 0

0

Entering edit mode

Ok, sorry, but that was all the help I could give! I noticed that a lot of people has the same issue with STAR. I post the link to a thread about the issue, maybe you will find some workaround there?

https://github.com/CGATOxford/CGATPipelines/issues/53

ADD REPLY • link 4.8 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Why not use the loop to submit separate SLURM jobs of each of your data files at one time?

You are asking for --ntasks=8 which probably gets you 8 cores (if one ntask is set to use one core). In your loop you are using --runThreadN 16 (which is asking for 16 threads to be run). You would want those two numbers to be the same.
It may be better to ask for #SBATCH --mem=48GB for the entire job rather than #SBATCH --mem-per-cpu=48GB. This would allow all cores to share the 48G RAM which should be enough for human genome.
Not sure how many data files you have but #SBATCH --time=10:30:00 may not be enough for all files if your data files are large.

ADD REPLY • link 4.8 years ago by GenoMax 141k

0

Entering edit mode

Thank you so much for the suggestions, I included your comments 1, and 2 in the script, but it wouldn't allow me to use more than 10 hours. For instance when I tried 15:30 h It gave this message

sbatch: error: No partiton matches jobs resource requirements Requirements: (Account:? JobName:star_slurm_rarecyte.sh Partition:? Time_limit:1530 Max_nodes:4294967294 Num_Tasks:16 Features:?)

when I set it back to 10:00 h it gave the other message again

EXITING: fatal error from shmget() trying to allocate shared memory piece: error typeInvalid argument Possible cause 1: not enough RAM. Check if you have enough RAM of at least31652814154 bytes Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31652814154

Do you have a script to use the loop to submit separate SLURM jobs of each of the data files at one time?

I have 24 files.

Thanks.

ADD REPLY • link 4.8 years ago by m.kamal • 0

0

Entering edit mode

I see so you are limited by 16 cores and 10 h per job for your account. I am going to show you an example of how sbatch can be used on the command line to submit jobs. This method is hopefully enabled on your cluster. If not you may be forced to submit individual jobs using your own example above for 24 files (I assume you have single end data set?).

for k in *fq;

do

filename=$(basename $k .fq);

echo sbatch --n 16 -N 1 --mem=48g --time=10:30:00 --mail-type=ALL --mail-user=redacted --wrap="STAR --genomeLoad LoadAndKeep --runThreadN 16 --outFilterScoreMinOverLread 0 --genomeDir ~/proj/sources/star/genome/hg38_star --outFilterMatchNmin 30 --outFilterMismatchNmax 100 outReadsUnmapped Fastx --outFilterMatchNminOverLread 0.6 --readFilesIn ~/proj/rarecyte/trimgalore/${k}  --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /use_dir_where_you_want_output/${k}_out";

done

This should just print out all 24 command lines for the 24 files to screen. You can submit one of them to see if the job goes through. If it does then remove the word echo to submit the jobs. My assumption is only one job will start running and rest will pend (you appear to have pretty pretty rigid limits on job resources). Once first job finishes the second should start. You get the idea.

ADD REPLY • link 4.8 years ago by GenoMax 141k