Question: RNA-SeQC memory error
gravatar for nicholas.owen1
2.5 years ago by
nicholas.owen10 wrote:

Dear All,

I have installed RNA-SeQC on our cluster and have been trying to run a number of jobs for RNA-SeQC (version 1.18) on 5-6 Gb BAM files of human RNA-seq data.

Info about the BAM files: human paired-end sequences, aligned to hg38 build genome using STAR, read groups added, sorted and indexed.

The command file for RNA-SeQC is generally:

/share/apps/jdk1.7.0_71/bin/java -Xmx60g -jar /user/tools/RNA-SeQC/RNA-SeQC_v1.1.8.jar \
 -bwa /user/bwa-0.7.10/bwa -BWArRNA \
/user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf -t \
/user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Annotation/Archives/archive-2015-08-14-08-18-15/Genes/genes.gtf -r \
/user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa -s \
/user/bam/bam_samples.txt -o /user/bam/RNA-SeQC

The bam_samples.txt only includes one BAM now as I was having the same error with many so I am trying to get it sorted just to work with one for now.

From the output it looks like the job is working fine to a certain stage and then I am getting memory errors. I have given the cluster job memory ranging from 16Gb to 128Gb with no luck at all.

The error I am getting is:

RNA-SeQC v1.1.8.1 07/11/14
Retriving contig names from reference
     contig names in reference: 195
Loading GTF for Read Counting
Converting to refGene
Transcript objects to RefGen format:    1 s
Running IntronicExpressionReadBlock Walker ....
Arguments: [-T, IntronicExpressionReadBlock, --outfile_metrics, /user/bam/RNA-SeQC/NC101/NC101.metrics.tmp.txt, -R, /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa, -I, /user/bam/NC101/NC101_unique.RG.bam, -refseq, /user/bam/RNA-SeQC/refGene.txt, -l, ERROR]
Finished writing /user/bam/RNA-SeQC/NC101/NC101.metrics.tmp.txt.intronReport.txt
Finished writing /user/bam/RNA-SeQC/NC101/NC101.metrics.tmp.txt.intronReport.txt_intronOnly.txt, now creating RPKM values for introns ..
GATK command result code: 0
     ... GATK CoutReadMetrics Analysis DONE
CountReadMetricsWalker Runtime: 12 min
Counting rRNA reads with BWA and /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf
Downsampling before aligning at rate: 0.009106594123052198
INFO    2016-09-15 21:23:02 DownsampleSam   Read 10000000 reads, kept 91285
INFO    2016-09-15 21:23:34 DownsampleSam   Read 20000000 reads, kept 182434
INFO    2016-09-15 21:23:56 DownsampleSam   Read 30000000 reads, kept 272798
INFO    2016-09-15 21:24:18 DownsampleSam   Read 40000000 reads, kept 364148
INFO    2016-09-15 21:24:39 DownsampleSam   Read 50000000 reads, kept 455860
INFO    2016-09-15 21:25:01 DownsampleSam   Read 60000000 reads, kept 547155
INFO    2016-09-15 21:25:24 DownsampleSam   Read 70000000 reads, kept 639031
INFO    2016-09-15 21:25:50 DownsampleSam   Read 80000000 reads, kept 730499
INFO    2016-09-15 21:26:19 DownsampleSam   Read 90000000 reads, kept 822225
INFO    2016-09-15 21:26:41 DownsampleSam   Read 100000000 reads, kept 912608
INFO    2016-09-15 21:27:10 DownsampleSam   Finished! Kept 1001492 out of 109810538 reads.
Downsampling exited with code: 0
BWA on end 1
Running BWA on /user/bam/RNA-SeQC/NC101/dSample.bam
Command: [/user/bwa-0.7.10/bwa, aln, /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf, -b1, /user/bam/RNA-SeQC/NC101/dSample.bam]
# There is insufficient memory for the Java Runtime Environment to continue.
# pthread_getattr_np
/opt/gridengine/default/spool/lum-7-13/job_scripts/246455: line 28: 57179 Aborted                 /share/apps/jdk1.7.0_71/bin/java -Xmx61440M -jar /user/tools/RNA-SeQC/RNA-SeQC_v1.1.8.jar -bwa /user/bwa-0.7.10/bwa -BWArRNA /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf -t /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Annotation/Archives/archive-2015-08-14-08-18-15/Genes/genes.gtf -r /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa -s /user/bam/bam_samples.txt -o /user/bam/RNA-SeQC

Apologies for the length of the post but I wanted to get as much useful information in place.

If anyone has any suggestions or advice it would be greatly appreciated!

Thanks :)

rna-seq • 1.1k views
ADD COMMENTlink written 2.5 years ago by nicholas.owen10

We love a lot of useful information in the first post, don't worry about it. Notice that the problem occurs for bwa, I'm not sure giving the java process a bigger heap space will make a difference for that one. Perhaps you could try to run bwa separately to try to isolate the problem?

ADD REPLYlink written 2.5 years ago by WouterDeCoster37k

Thanks for the advice, I have tried running BWA and everything is fine there, will look further into the memory allocated to java, thanks again

ADD REPLYlink written 2.4 years ago by nicholas.owen10

This may be an obvious question but is that Java you are using 64-bit? Can you see if ulimit -a shows any limits on your account?

ADD REPLYlink written 2.5 years ago by genomax64k

It looks like you're running this on a cluster, can you send this to the cluster admin and ask him/her if there is an oddly small stack size limitation on some/all of the nodes (you don't need to know what that means)? My guess from the error message is that there's an odd limitation with that.

ADD REPLYlink written 2.5 years ago by Devon Ryan88k

Thanks, I did and have no resolution as yet from there, but will follow up next week, will update :)

ADD REPLYlink written 2.4 years ago by nicholas.owen10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1416 users visited in the last hour