Question: Picard MarkDuplicates error after using bamaddrg: "bin field of BAM record does not equal value computed..."
1
gravatar for Mark Ebbert
4.6 years ago by
Mark Ebbert70
United States
Mark Ebbert70 wrote:

Hi,

We have a lot of bams from data that was sequenced and aligned before we received it. All of the bams are missing the read group (@RG) in the reads even though the read group is defined in the header. We ran `bamaddrg` (https://github.com/ekg/bamaddrg) to add the read groups to all reads and it appeared to run fine. The problem is that running Picard's MarkDuplicates now throws the following error on the bam with read groups added:

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1642900, Read name HS2000-1005_167:8:1103:3541:88508, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
        at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:643)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
        at picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:413)
        at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
        at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)

We do not get the error on the original bam. Another post suggested the option `VALIDATION_STRINGENCY=LENIENT`, which allows MarkDuplicates to proceed, but there are a fair number of reads that get this error. I don't know how these are being treated, but for our study we cannot ignore these reads. Perhaps we could MarkDuplicates before adding read groups, but I'm concerned about unknown consequences. Reindexing does not solve the problem. I also posted an issue on the bamaddrg github page. Hopefully the author will be able to make a suggestion.

Questions:

  1. Does anyone know why we would get this error after adding read groups? There are no other differences in the reads as far as I can tell.
  2. The error states that the bin is calculated based on alignment start and end. These values did not change! So why would the calculated bin change?
  3. I came across one solution to convert the bam > sam > bam so the bins are recalculated, but we have 1000 whole genomes. That seems a bit ridiculous. Any other suggestions?

Thanks in advance! Let me know if I can clarify anything.

ADD COMMENTlink modified 4.6 years ago by lomereiter430 • written 4.6 years ago by Mark Ebbert70
4
gravatar for lomereiter
4.6 years ago by
lomereiter430
Russian Federation
lomereiter430 wrote:

This might be a bug in BamTools library, e.g. treating some edge case incorrectly.
For some reason, it always calculates bin number upon writing a record to the output file, and here is the place where it may change.

Now regarding your 3rd question. Picard source code contains a tool specifically intended for fixing bin numbers but it's not included in the distribution, you have to compile it yourself.

But I would suggest an alternative for this particular situation: modify Picard tools a little. MarkDuplicates doesn't use bin values at all, so wrong values won't affect the result. Fixing them requires adding a single line in htsjdk/src/java/htsjdk/samtools/BAMFileReader.java after the line '++this.samRecordIndex;':
mNextRecord.setIndexingBin(mNextRecord.computeIndexingBin());
That should do the trick, as it forces all tools to recompute bin number on the fly.
I've uploaded the jars built with this patch here.


 

ADD COMMENTlink written 4.6 years ago by lomereiter430

Super helpful, thanks lomereiter! We opted for the first option and it worked great.

ADD REPLYlink written 4.6 years ago by Mark Ebbert70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2210 users visited in the last hour