Dear All,
I have installed RNA-SeQC on our cluster and have been trying to run a number of jobs for RNA-SeQC (version 1.18) on 5-6 Gb BAM files of human RNA-seq data.
Info about the BAM files: human paired-end sequences, aligned to hg38 build genome using STAR, read groups added, sorted and indexed.
The command file for RNA-SeQC is generally:
/share/apps/jdk1.7.0_71/bin/java -Xmx60g -jar /user/tools/RNA-SeQC/RNA-SeQC_v1.1.8.jar \
-bwa /user/bwa-0.7.10/bwa -BWArRNA \
/user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf -t \
/user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Annotation/Archives/archive-2015-08-14-08-18-15/Genes/genes.gtf -r \
/user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa -s \
/user/bam/bam_samples.txt -o /user/bam/RNA-SeQC
The bam_samples.txt only includes one BAM now as I was having the same error with many so I am trying to get it sorted just to work with one for now.
From the output it looks like the job is working fine to a certain stage and then I am getting memory errors. I have given the cluster job memory ranging from 16Gb to 128Gb with no luck at all.
The error I am getting is:
RNA-SeQC v1.1.8.1 07/11/14
Retriving contig names from reference
contig names in reference: 195
Loading GTF for Read Counting
Converting to refGene
Transcript objects to RefGen format: 1 s
Running IntronicExpressionReadBlock Walker ....
Arguments: [-T, IntronicExpressionReadBlock, --outfile_metrics, /user/bam/RNA-SeQC/NC101/NC101.metrics.tmp.txt, -R, /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa, -I, /user/bam/NC101/NC101_unique.RG.bam, -refseq, /user/bam/RNA-SeQC/refGene.txt, -l, ERROR]
Finished writing /user/bam/RNA-SeQC/NC101/NC101.metrics.tmp.txt.intronReport.txt
Finished writing /user/bam/RNA-SeQC/NC101/NC101.metrics.tmp.txt.intronReport.txt_intronOnly.txt, now creating RPKM values for introns ..
GATK command result code: 0
... GATK CoutReadMetrics Analysis DONE
CountReadMetricsWalker Runtime: 12 min
Counting rRNA reads with BWA and /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf
Downsampling before aligning at rate: 0.009106594123052198
INFO 2016-09-15 21:23:02 DownsampleSam Read 10000000 reads, kept 91285
INFO 2016-09-15 21:23:34 DownsampleSam Read 20000000 reads, kept 182434
INFO 2016-09-15 21:23:56 DownsampleSam Read 30000000 reads, kept 272798
INFO 2016-09-15 21:24:18 DownsampleSam Read 40000000 reads, kept 364148
INFO 2016-09-15 21:24:39 DownsampleSam Read 50000000 reads, kept 455860
INFO 2016-09-15 21:25:01 DownsampleSam Read 60000000 reads, kept 547155
INFO 2016-09-15 21:25:24 DownsampleSam Read 70000000 reads, kept 639031
INFO 2016-09-15 21:25:50 DownsampleSam Read 80000000 reads, kept 730499
INFO 2016-09-15 21:26:19 DownsampleSam Read 90000000 reads, kept 822225
INFO 2016-09-15 21:26:41 DownsampleSam Read 100000000 reads, kept 912608
INFO 2016-09-15 21:27:10 DownsampleSam Finished! Kept 1001492 out of 109810538 reads.
Downsampling exited with code: 0
BWA on end 1
Running BWA on /user/bam/RNA-SeQC/NC101/dSample.bam
Command: [/user/bwa-0.7.10/bwa, aln, /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf, -b1, /user/bam/RNA-SeQC/NC101/dSample.bam]
#
# There is insufficient memory for the Java Runtime Environment to continue.
# pthread_getattr_np
/opt/gridengine/default/spool/lum-7-13/job_scripts/246455: line 28: 57179 Aborted /share/apps/jdk1.7.0_71/bin/java -Xmx61440M -jar /user/tools/RNA-SeQC/RNA-SeQC_v1.1.8.jar -bwa /user/bwa-0.7.10/bwa -BWArRNA /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/rRNA.gtf -t /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Annotation/Archives/archive-2015-08-14-08-18-15/Genes/genes.gtf -r /user/ref_genome/hg38_ucsc/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa -s /user/bam/bam_samples.txt -o /user/bam/RNA-SeQC
Apologies for the length of the post but I wanted to get as much useful information in place.
If anyone has any suggestions or advice it would be greatly appreciated!
Thanks :)
We love a lot of useful information in the first post, don't worry about it. Notice that the problem occurs for bwa, I'm not sure giving the java process a bigger heap space will make a difference for that one. Perhaps you could try to run bwa separately to try to isolate the problem?
Thanks for the advice, I have tried running BWA and everything is fine there, will look further into the memory allocated to java, thanks again
This may be an obvious question but is that Java you are using 64-bit? Can you see if
ulimit -a
shows any limits on your account?It looks like you're running this on a cluster, can you send this to the cluster admin and ask him/her if there is an oddly small stack size limitation on some/all of the nodes (you don't need to know what that means)? My guess from the error message is that there's an odd limitation with that.
Thanks, I did and have no resolution as yet from there, but will follow up next week, will update :)