picard sortsam error
1
1
Entering edit mode
3.3 years ago

Hi,

Does anyone come across this error in picard SortSam command? (site)

The command run:

java -Djava.io.tmpdir=$TMP_SPACE -jar $PICARD \
   SortSam I=$IN_BAM O=$OUT_BAM SORT_ORDER=coordinate \
   VALIDATION_STRINGENCY=LENIENT

The command tries to sort a bam file by coordinate. It runs for 2 hours and then it shows the error below:

[Thu Jan 07 00:14:51 UTC 2021] picard.sam.SortSam done. Elapsed time: 150.85 minutes. Runtime.totalMemory()=24461180928 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" java.lang.IllegalArgumentException: Value (4135) to large to be written as ubyte. at htsjdk.samtools.util.BinaryCodec.writeUByte(BinaryCodec.java:331) at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:155) at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:40) at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:254) at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:182) at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:187) at picard.sam.SortSam.doWork(SortSam.java:161) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:304) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I search a bit, but I'm not a java programmer. As far as I can tell, it does not seem to be a problem of space neither RAM memory (I run this command in one server with big memory). The file in question is quite big (a few hundred GB).

The error seems related with a problem to write into the disk, but I'm not familiar with what is ubyte neither how to interpret and possibly solve the problem (if possible).

Thank you in advance for help or advice,

António

alignment Assembly next-gen software error • 1.8k views
ADD COMMENT
3
Entering edit mode
3.3 years ago

htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:155) at

the error is here: https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/BAMRecordCodec.java#L155

it's thrown when the read NAME is too long (the SAM spec says length(name) < 256 (coded as uint8_t) )

so it's a problem with your bam. Check the length of your read names.

ADD COMMENT
0
Entering edit mode

Thank you Pierre Lindenbaum!

ADD REPLY
0
Entering edit mode

Do you know if I need to check the read names in bytes or other encoding?

I just checked the read-length as strings and all seem to be <= 67.

Thanks!

ADD REPLY
1
Entering edit mode

I just checked the read-length as strings and all seem to be <= 67.

sorry to ask but, how did you check that ?

can you please try to narrow your input bam to find a line for this error.

ADD REPLY
0
Entering edit mode

First I retrieved the length of query read names from bam with:

samtools view <sample_name>.bam | cut -f 1 | awk '{ print length }' > read_name_length.txt

Then, I sort the output and print the last lines to check for the highest read names:

cat read_name_length.txt | uniq | sort -h | tail

Is there something wrong with these commands? Did I miss something?

Thanks for your help!

ADD REPLY
1
Entering edit mode

can you please try:

samtools view input.bam | awk '(length($1)>=250)' | head
ADD REPLY
1
Entering edit mode

but it looks ok.

again, try to cut the bam/sam to narrow the error.

ADD REPLY
1
Entering edit mode

you can also try to validate the bam with picard ValidateSamFile

ADD REPLY
0
Entering edit mode

Thank you Pierre Lindenbaum!

I put both running. It is a big bam file, so it will take a bit.

ADD REPLY
0
Entering edit mode

Sorry for the late reply.

Just to tell you that both commands finished.

The awk command did not throw nothing to stdout. The ValidateSamFile threw the following last messages:

INFO    2021-01-09 01:28:58     SamFileValidator        Validated Read   640,000,000 records.  Elapsed time: 13:18:02s.  Time for last 10,000,000:  446s.  Last read position: 6774013:25,193.  Last read name:
A00125:378:HLHCVDSXY:2:2637:12481:11584

## HISTOGRAM    java.lang.String

Error Type      Count
ERROR:MATE_NOT_FOUND    1
ERROR:MISSING_READ_GROUP        1
WARNING:RECORD_MISSING_READ_GROUP       649043177

[Sat Jan 09 01:37:29 UTC 2021] picard.sam.ValidateSamFile done. Elapsed time: 807.56 minutes.
Runtime.totalMemory()=10502537216

It gives warnings and two errors, but I'm not sure if this is related with the previous error.

ADD REPLY

Login before adding your answer.

Traffic: 2138 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6