Hi,
Does anyone come across this error in picard SortSam
command? (site)
The command run:
java -Djava.io.tmpdir=$TMP_SPACE -jar $PICARD \
SortSam I=$IN_BAM O=$OUT_BAM SORT_ORDER=coordinate \
VALIDATION_STRINGENCY=LENIENT
The command tries to sort a bam file by coordinate. It runs for 2 hours and then it shows the error below:
[Thu Jan 07 00:14:51 UTC 2021] picard.sam.SortSam done. Elapsed time: 150.85 minutes. Runtime.totalMemory()=24461180928 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" java.lang.IllegalArgumentException: Value (4135) to large to be written as ubyte. at htsjdk.samtools.util.BinaryCodec.writeUByte(BinaryCodec.java:331) at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:155) at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:40) at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:254) at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:182) at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:187) at picard.sam.SortSam.doWork(SortSam.java:161) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:304) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
I search a bit, but I'm not a java programmer. As far as I can tell, it does not seem to be a problem of space neither RAM memory (I run this command in one server with big memory). The file in question is quite big (a few hundred GB).
The error seems related with a problem to write into the disk, but I'm not familiar with what is ubyte
neither how to interpret and possibly solve the problem (if possible).
Thank you in advance for help or advice,
António
Thank you Pierre Lindenbaum!
Do you know if I need to check the read names in bytes or other encoding?
I just checked the read-length as strings and all seem to be <= 67.
Thanks!
sorry to ask but, how did you check that ?
can you please try to narrow your input bam to find a line for this error.
First I retrieved the length of query read names from bam with:
Then, I sort the output and print the last lines to check for the highest read names:
Is there something wrong with these commands? Did I miss something?
Thanks for your help!
can you please try:
but it looks ok.
again, try to cut the bam/sam to narrow the error.
you can also try to validate the bam with picard ValidateSamFile
Thank you Pierre Lindenbaum!
I put both running. It is a big bam file, so it will take a bit.
Sorry for the late reply.
Just to tell you that both commands finished.
The
awk
command did not throw nothing tostdout
. TheValidateSamFile
threw the following last messages:It gives warnings and two errors, but I'm not sure if this is related with the previous error.