Question

Error in running QoRTs after hisat2 alignment

0

Entering edit mode

6.6 years ago

bioinfo8 ▴ 230

Hi, I ran QoRTs on multiple bam files and here is the log for one of them:

I tried -Xmx4G, -Xmx8G, -Xmx16G, but still same errors. Please guide. Thanks.

1.sorted.bam
Starting QoRTs v1.2.42 (Compiled Fri Jun  2 12:23:55 EDT 2017)
Starting time: (Mon Oct 16 13:33:39 MEST 2017)
INPUT_COMMAND(QC)
  INPUT_ARG(infile)=1.sorted.bam
  INPUT_ARG(gtffile)=1.gtf  
  INPUT_ARG(outdir)=1/
  INPUT_ARG(generatePlots)=true
Creating Directory: 1/
Created Log File: 1/QC.eVW6yCQ9UUY7.log
Starting QC
[Time: 2017-10-16 13:33:39] [Mem usage: [85MB / 2058MB]] [Elapsed Time: 00:00:00.0000]
QoRTs is Running in paired-end mode.
QoRTs is Running in any-sorted mode.
NOTE: Function "overlapMatch" requires function "mismatchEngine". Adding "mismatchEngine" to the active function list...
Running functions: CigarOpDistribution, GCDistribution, GeneCalcs, InsertSize, 
        JunctionCalcs, NVC, QualityScoreDistribution, StrandCheck, 
        chromCounts, cigarLocusCounts, mismatchEngine, overlapMatch, 
        readLengthDistro, writeBiotypeCounts, writeClippedNVC, 
        writeDESeq, writeDEXSeq, writeGeneBody, writeGeneCounts, 
        writeGenewiseGeneBody, writeJunctionSeqCounts, 
        writeKnownSplices, writeNovelSplices, writeSpliceExon
Checking first 10000 reads. Checking SAM file for formatting errors...
NOTE: Read length is not consistent.
   In the first 10000 reads, read length varies from 43 to 140 (param maxReadLength=140)
Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information.
   WARNING WARNING WARNING: Read length is not consistent, AND "--maxReadLength" option is not set!
      QoRTs has ATTEMPTED to determine the maximum read length (140).
      It is STRONGLY recommended that you use the --maxReadLength option 
      to set the maximum possible read length, or else errors may occur if/when 
      reads longer than 140 appear.
   Sorting Note: Reads are not sorted by name (This is OK).
   Sorting Note: Reads are sorted by position (This is OK).
Done checking first 10000 reads. WARNINGS FOUND!
SAMRecord Reader Generated. Read length: 140.
[Time: 2017-10-16 13:33:49] [Mem usage: [295MB / 2595MB]] [Elapsed Time: 00:00:10.0813]
Compiling flat feature annotation, internally in memory...
Internal flat feature annotation compiled!
QC Utilities Generated!
[Time: 2017-10-16 13:36:18] [Mem usage: [3221MB / 3894MB]] [Elapsed Time: 00:02:39.0679]
NOTE: Unmatched Read-PAIR-Buffer Size > 100000 [Mem usage:[1665MB / 4062MB]]
    (This is generally not a problem, but if this increases further then OutOfMemoryExceptions
    may occur.
    If memory errors do occur, either increase memory allocation or sort the bam-file by name
    and rerun with the '--nameSorted' option.
    This might also indicate that your dataset contains an unusually large number of
    chimeric read-pairs. Or it could occur simply due to the presence of genomic
    loci with extremly high coverage or complex splicing. It may also indicate a SAM/BAM file that 
    does not adhere to the standard SAM specification.)
NOTE: Unmatched Read-PAIR-Buffer Size > 200000 [Mem usage:[2101MB / 4062MB]]
NOTE: Unmatched Read-PAIR-Buffer Size > 400000 [Mem usage:[3017MB / 4062MB]]
NOTE: Unmatched Read-PAIR-Buffer Size > 800000 [Mem usage:[3604MB / 5GB]]
NOTE: Unmatched Read-PAIR-Buffer Size > 1600000 [Mem usage:[5GB / 7GB]]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at net.sf.samtools.BinaryTagCodec.readTags(BinaryTagCodec.java:282)
    at net.sf.samtools.BAMRecord.decodeAttributes(BAMRecord.java:308)
    at net.sf.samtools.BAMRecord.getAttribute(BAMRecord.java:288)
    at net.sf.samtools.SAMRecord.isValid(SAMRecord.java:1566)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:632)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:618)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:588)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:774)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:752)
    at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
    at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:930)
    at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:945)
    at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:985)
    at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:988)
    at internalUtils.stdUtils$$anon$2.hasNext(stdUtils.scala:372)
    at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:406)
    at internalUtils.commonSeqUtils$$anon$5.addNextPairToBuffer(commonSeqUtils.scala:1038)
    at internalUtils.commonSeqUtils$$anon$5.next(commonSeqUtils.scala:1077)
    at internalUtils.commonSeqUtils$$anon$5.next(commonSeqUtils.scala:1026)
    at internalUtils.stdUtils$IteratorProgressReporter$$anon$5.next(stdUtils.scala:493)
    at scala.collection.Iterator$class.foreach(Iterator.scala:743)
    at internalUtils.stdUtils$IteratorProgressReporter$$anon$5.foreach(stdUtils.scala:487)
    at qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1285)
    at qcUtils.runAllQC$.run(runAllQC.scala:939)
    at qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:628)
    at runner.runner$.main(runner.scala:97)
    at runner.runner.main(runner.scala)
QoRTs done

RNA-Seq QoRTs paired-end samtools java • 2.5k views

ADD COMMENT • link updated 6.6 years ago by h.mon 35k • written 6.6 years ago by bioinfo8 ▴ 230

score 0 · Answer 1 · 2017-10-16

0

Entering edit mode

6.6 years ago

h.mon 35k

QoRTs is extremelly memory-hungry for coordinate-sorted files, despite its manual not making this point clear. If you have lots of memory, you my try increasing the JVM memory with -Xmx, for example -Xmx64g or even higher. Or sort your bam files by name (samtools sort -n -O bam -o file.sort.bam file.bam) and use the --nameSorted QoRTs parameter - for name-sorted bam files, memory usage is indeed low.

Incidentally, QualiMap RNAseq module has the same issue.

ADD COMMENT • link 6.6 years ago by h.mon 35k

0

Entering edit mode

@h.mom Thanks, I will try and let you know.

ADD REPLY • link 6.6 years ago by bioinfo8 ▴ 230

0

Entering edit mode

@h.mom I tried with -Xmx64g, but same error. I don't want to sort bam by name as it will take so longer and I have multiple files.