Question: aligning multiple files to the same genome using star
0
gravatar for m.kamal
5 weeks ago by
m.kamal0
m.kamal0 wrote:

Hi all,

I am trying to run a slurm script to align multiple single reads files using star. I am using the script below, however I get this error message

"Jul 21 00:29:37 ..... Started STAR run

EXITING: fatal error from shmget() trying to allocate shared memory piece: error typeInvalid argument
Possible cause 1: not enough RAM. Check if you have enough RAM of at least31652814154 bytes
Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31652814154"

the script I am using is:

!/bin/bash
#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=48GB
#SBATCH --time=10:30:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=redacted


STAR --genomeLoad LoadAndExit --genomeDir ~/proj/sources/star/genome/hg38_star

for k in *fq

do

filename=$(basename $k .fq)

STAR --runThreadN 16 --outFilterScoreMinOverLread 0 --genomeDir ~/proj/sources/star/genome/hg38_star --outFilterMatchNmin 30 --outFilterMismatchNmax 100 outReadsUnmapped Fastx --outFilterMatchNminOverLread 0.6 --readFilesIn ~/proj/rarecyte/trimgalore/$k  --outSAMtype BAM SortedByCoordinate

done

Your help is very much appreciated.

Thanks.

Mohamed

rna-seq • 172 views
ADD COMMENTlink modified 10 days ago by Biostar ♦♦ 20 • written 5 weeks ago by m.kamal0

Did you try following the solution proposed in the error message? (ulimit -v 31652814154)

ADD REPLYlink written 5 weeks ago by Fabio Marroni2.3k

Yes indeed I did and I also set the limit to unlimited and still getting the same message

ADD REPLYlink written 5 weeks ago by m.kamal0

Ok, sorry, but that was all the help I could give! I noticed that a lot of people has the same issue with STAR. I post the link to a thread about the issue, maybe you will find some workaround there?

https://github.com/CGATOxford/CGATPipelines/issues/53

ADD REPLYlink written 4 weeks ago by Fabio Marroni2.3k

Why not use the loop to submit separate SLURM jobs of each of your data files at one time?

  1. You are asking for --ntasks=8 which probably gets you 8 cores (if one ntask is set to use one core). In your loop you are using --runThreadN 16 (which is asking for 16 threads to be run). You would want those two numbers to be the same.
  2. It may be better to ask for #SBATCH --mem=48GB for the entire job rather than #SBATCH --mem-per-cpu=48GB. This would allow all cores to share the 48G RAM which should be enough for human genome.
  3. Not sure how many data files you have but #SBATCH --time=10:30:00 may not be enough for all files if your data files are large.
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax70k

Thank you so much for the suggestions, I included your comments 1, and 2 in the script, but it wouldn't allow me to use more than 10 hours. For instance when I tried 15:30 h It gave this message

sbatch: error: No partiton matches jobs resource requirements Requirements: (Account:? JobName:star_slurm_rarecyte.sh Partition:? Time_limit:1530 Max_nodes:4294967294 Num_Tasks:16 Features:?)

when I set it back to 10:00 h it gave the other message again

EXITING: fatal error from shmget() trying to allocate shared memory piece: error typeInvalid argument Possible cause 1: not enough RAM. Check if you have enough RAM of at least31652814154 bytes Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31652814154

Do you have a script to use the loop to submit separate SLURM jobs of each of the data files at one time?

I have 24 files.

Thanks.

ADD REPLYlink written 5 weeks ago by m.kamal0

I see so you are limited by 16 cores and 10 h per job for your account. I am going to show you an example of how sbatch can be used on the command line to submit jobs. This method is hopefully enabled on your cluster. If not you may be forced to submit individual jobs using your own example above for 24 files (I assume you have single end data set?).

for k in *fq;

do

filename=$(basename $k .fq);

echo sbatch --n 16 -N 1 --mem=48g --time=10:30:00 --mail-type=ALL --mail-user=redacted --wrap="STAR --genomeLoad LoadAndKeep --runThreadN 16 --outFilterScoreMinOverLread 0 --genomeDir ~/proj/sources/star/genome/hg38_star --outFilterMatchNmin 30 --outFilterMismatchNmax 100 outReadsUnmapped Fastx --outFilterMatchNminOverLread 0.6 --readFilesIn ~/proj/rarecyte/trimgalore/${k}  --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /use_dir_where_you_want_output/${k}_out";

done

This should just print out all 24 command lines for the 24 files to screen. You can submit one of them to see if the job goes through. If it does then remove the word echo to submit the jobs. My assumption is only one job will start running and rest will pend (you appear to have pretty pretty rigid limits on job resources). Once first job finishes the second should start. You get the idea.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1443 users visited in the last hour