Problem With Picard Dedup In Illumina Alignment Pipeline
6
6
Entering edit mode
9.6 years ago
tommivat ▴ 250

I'm trying to complement my alignment pipeline with picard dedup as recommended by Broad (see their best practices). I have paired end Illumina reads and I use only reads from chr17 (mapped previsouly) as test data. After, bwa I run picard MarkDuplicates and get the following error

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 630636, Read name HWI-H212:69:C0NR3ACXX:1:1212:10100:13262, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:541) at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
at net.sf.samtools.SAMFileReaderAssertableIterator.next(SAMFileReader.java:650) at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:397) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:161) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:145)  I have tried running FixMateInformation, as suggested in SeqAnswers but then the error is Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 630636, Read name HWI-H212:69:C0NR3ACXX:1:1212:10100:13262, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448) at net.sf.samtools.BAMFileReaderBAMFileIterator.advance(BAMFileReader.java:541)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522) at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:148)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:76)


Finally, ValidateSamFile return the following error for number of reads

ERROR: Record 43432, Read name HWI-H212:69:C0NR3ACXX:1:1310:18404:48164, Mate negative strand flag does not match read negative strand flag of mate


Any suggestions?

picard alignment duplicates • 8.9k views
7
Entering edit mode
9.6 years ago
Mitch Bekritsky ★ 1.3k

I ran into a bin field problem when I used Picard's MarkDuplicates before, and tried to find documentation for the error either in Picard's source code or online. Turns out there's not much on it, and I think it has to do with the reads not being in the correct bin in a BAM file for random access. Unfortunately, I couldn't find a way to fix it either.

My best solution was to set VALIDATION_STRINGENCY=LENIENT on all my Picard jobs, which, while it didn't eliminate my problem, did prevent Picard from dying on this particular error. Since it only affects one BAM record and I don't do much random access of BAM files anyway, I'm hoping that it won't affect my pipeline too much.

If anyone has a better solution, there's at least two of us who would love to hear it!

2
Entering edit mode

I just noticed in the SAM format specification v1.4-r985, there are two pieces of code described that calculate a read's bin index number based on its position in the alignment. If you were really interested in fixing the bug instead of using VALIDATION_STRINGENCY to compare it, maybe you could compare the bin field in your record to what it should be according to the SAM specification?

3
Entering edit mode
9.5 years ago
henryvuong ▴ 810

Hi, I encountered the same error with picard tool version 1.96 while running CollectTargetedPcrMetrics. I tried the same command line with picard version 1.79 then it worked fine.

0
Entering edit mode

Same problem here with ValidateSamFile in 1.96. Picard 1.87 works, but 1.90 fails with a NullPointerException.

0
Entering edit mode

Same problem here with ReorderSam in 1.97. Picard 1.88 seems to work fine. Strange.

2
Entering edit mode
8.4 years ago

The bottom line seems to be: ignoring it is ok, otherwise you could re-create the input bam file and/or the .bai index which should fix the issue.

1
Entering edit mode
8.7 years ago
xrao ▴ 30

Hello, thank you for your posts. I encountered the same issue and I avoid the errors using picard 1.88 as suggested. But then the GATK picks up the errors again. I still have to use --validation_stringency=LENIENT to get GATK to run normally. Do you have any updates about solving this problem?

BTW, I am thinking to use the option IGNORE=INVALID_INDEXING_BIN for the picard ValidateSamFile step, but not sure if it is the right way.

Thank you in advance for any suggestions!

0
Entering edit mode
8.4 years ago

Did you try to align with bowtie2 ? When using piccard tools (some of them, especially the metrics tools, I sometimes encounter problems with bwa alignment, where I have to use the ignore warnings, but when I use bowtie2 I have no problems, may be it is worth trying

0
Entering edit mode
7.3 years ago

I encountered the same errors. I thought mistakes happened in the combination of forward and reverse sam files(generated by bwa aln). Then I used

bwa mem ref.fa my_F.fq my_R.fq > my.sam


to get a complete sam file.

samtools view -bS my.sam -o my.bam


to convert sam file to bam format. Picard worked successfully with my.bam file.