Question: picard sortsam error
1
gravatar for antonioggsousa
11 days ago by
antonioggsousa1.9k
antonioggsousa1.9k wrote:

Hi,

Does anyone come across this error in picard SortSam command? (site)

The command run:

java -Djava.io.tmpdir=$TMP_SPACE -jar $PICARD \
   SortSam I=$IN_BAM O=$OUT_BAM SORT_ORDER=coordinate \
   VALIDATION_STRINGENCY=LENIENT

The command tries to sort a bam file by coordinate. It runs for 2 hours and then it shows the error below:

[Thu Jan 07 00:14:51 UTC 2021] picard.sam.SortSam done. Elapsed time: 150.85 minutes. Runtime.totalMemory()=24461180928 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" java.lang.IllegalArgumentException: Value (4135) to large to be written as ubyte. at htsjdk.samtools.util.BinaryCodec.writeUByte(BinaryCodec.java:331) at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:155) at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:40) at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:254) at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:182) at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:187) at picard.sam.SortSam.doWork(SortSam.java:161) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:304) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I search a bit, but I'm not a java programmer. As far as I can tell, it does not seem to be a problem of space neither RAM memory (I run this command in one server with big memory). The file in question is quite big (a few hundred GB).

The error seems related with a problem to write into the disk, but I'm not familiar with what is ubyte neither how to interpret and possibly solve the problem (if possible).

Thank you in advance for help or advice,

António

ADD COMMENTlink modified 11 days ago by Pierre Lindenbaum133k • written 11 days ago by antonioggsousa1.9k
3
gravatar for Pierre Lindenbaum
11 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:155) at

the error is here: https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/BAMRecordCodec.java#L155

it's thrown when the read NAME is too long (the SAM spec says length(name) < 256 (coded as uint8_t) )

so it's a problem with your bam. Check the length of your read names.

ADD COMMENTlink modified 11 days ago • written 11 days ago by Pierre Lindenbaum133k

Thank you Pierre Lindenbaum!

ADD REPLYlink written 11 days ago by antonioggsousa1.9k

Do you know if I need to check the read names in bytes or other encoding?

I just checked the read-length as strings and all seem to be <= 67.

Thanks!

ADD REPLYlink written 10 days ago by antonioggsousa1.9k
1

I just checked the read-length as strings and all seem to be <= 67.

sorry to ask but, how did you check that ?

can you please try to narrow your input bam to find a line for this error.

ADD REPLYlink written 10 days ago by Pierre Lindenbaum133k

First I retrieved the length of query read names from bam with:

samtools view <sample_name>.bam | cut -f 1 | awk '{ print length }' > read_name_length.txt

Then, I sort the output and print the last lines to check for the highest read names:

cat read_name_length.txt | uniq | sort -h | tail

Is there something wrong with these commands? Did I miss something?

Thanks for your help!

ADD REPLYlink modified 10 days ago • written 10 days ago by antonioggsousa1.9k
1

can you please try:

samtools view input.bam | awk '(length($1)>=250)' | head
ADD REPLYlink written 10 days ago by Pierre Lindenbaum133k
1

but it looks ok.

again, try to cut the bam/sam to narrow the error.

ADD REPLYlink written 10 days ago by Pierre Lindenbaum133k
1

you can also try to validate the bam with picard ValidateSamFile

ADD REPLYlink written 10 days ago by Pierre Lindenbaum133k

Thank you Pierre Lindenbaum!

I put both running. It is a big bam file, so it will take a bit.

ADD REPLYlink written 10 days ago by antonioggsousa1.9k

Sorry for the late reply.

Just to tell you that both commands finished.

The awk command did not throw nothing to stdout. The ValidateSamFile threw the following last messages:

INFO    2021-01-09 01:28:58     SamFileValidator        Validated Read   640,000,000 records.  Elapsed time: 13:18:02s.  Time for last 10,000,000:  446s.  Last read position: 6774013:25,193.  Last read name:
A00125:378:HLHCVDSXY:2:2637:12481:11584

## HISTOGRAM    java.lang.String

Error Type      Count
ERROR:MATE_NOT_FOUND    1
ERROR:MISSING_READ_GROUP        1
WARNING:RECORD_MISSING_READ_GROUP       649043177

[Sat Jan 09 01:37:29 UTC 2021] picard.sam.ValidateSamFile done. Elapsed time: 807.56 minutes.
Runtime.totalMemory()=10502537216

It gives warnings and two errors, but I'm not sure if this is related with the previous error.

ADD REPLYlink modified 7 days ago • written 7 days ago by antonioggsousa1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1833 users visited in the last hour
_