Problem With Picard Dedup In Illumina Alignment Pipeline
6
6
Entering edit mode
8.3 years ago
tommivat ▴ 250

I'm trying to complement my alignment pipeline with picard dedup as recommended by Broad (see their best practices). I have paired end Illumina reads and I use only reads from chr17 (mapped previsouly) as test data. After, bwa I run picard MarkDuplicates and get the following error

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 630636, Read name HWI-H212:69:C0NR3ACXX:1:1212:10100:13262, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
    at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:541)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:397)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:161)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:145)

I have tried running FixMateInformation, as suggested in SeqAnswers but then the error is

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 630636, Read name HWI-H212:69:C0NR3ACXX:1:1212:10100:13262, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
    at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:541)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
    at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:148)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:76)

Finally, ValidateSamFile return the following error for number of reads

ERROR: Record 43432, Read name HWI-H212:69:C0NR3ACXX:1:1310:18404:48164, Mate negative strand flag does not match read negative strand flag of mate

Any suggestions?

picard alignment duplicates • 8.3k views
ADD COMMENT
7
Entering edit mode
8.3 years ago
Mitch Bekritsky ★ 1.3k

I ran into a bin field problem when I used Picard's MarkDuplicates before, and tried to find documentation for the error either in Picard's source code or online. Turns out there's not much on it, and I think it has to do with the reads not being in the correct bin in a BAM file for random access. Unfortunately, I couldn't find a way to fix it either.

My best solution was to set VALIDATION_STRINGENCY=LENIENT on all my Picard jobs, which, while it didn't eliminate my problem, did prevent Picard from dying on this particular error. Since it only affects one BAM record and I don't do much random access of BAM files anyway, I'm hoping that it won't affect my pipeline too much.

If anyone has a better solution, there's at least two of us who would love to hear it!

ADD COMMENT
2
Entering edit mode

I just noticed in the SAM format specification v1.4-r985, there are two pieces of code described that calculate a read's bin index number based on its position in the alignment. If you were really interested in fixing the bug instead of using VALIDATION_STRINGENCY to compare it, maybe you could compare the bin field in your record to what it should be according to the SAM specification?

ADD REPLY
3
Entering edit mode
8.2 years ago
henryvuong ▴ 810

Hi, I encountered the same error with picard tool version 1.96 while running CollectTargetedPcrMetrics. I tried the same command line with picard version 1.79 then it worked fine.

ADD COMMENT
0
Entering edit mode

Same problem here with ValidateSamFile in 1.96. Picard 1.87 works, but 1.90 fails with a NullPointerException.

ADD REPLY
0
Entering edit mode

Same problem here with ReorderSam in 1.97. Picard 1.88 seems to work fine. Strange.

ADD REPLY
2
Entering edit mode
7.1 years ago

There is now a post on the GATK user forum about this issue: http://gatkforums.broadinstitute.org/discussion/4290/sam-bin-field-error-for-the-gatk-run

The bottom line seems to be: ignoring it is ok, otherwise you could re-create the input bam file and/or the .bai index which should fix the issue.

ADD COMMENT
1
Entering edit mode
7.4 years ago
xrao ▴ 30

Hello, thank you for your posts. I encountered the same issue and I avoid the errors using picard 1.88 as suggested. But then the GATK picks up the errors again. I still have to use --validation_stringency=LENIENT to get GATK to run normally. Do you have any updates about solving this problem?

BTW, I am thinking to use the option IGNORE=INVALID_INDEXING_BIN for the picard ValidateSamFile step, but not sure if it is the right way.

Thank you in advance for any suggestions!

ADD COMMENT
0
Entering edit mode
7.1 years ago
Rad ▴ 800

Did you try to align with bowtie2 ? When using piccard tools (some of them, especially the metrics tools, I sometimes encounter problems with bwa alignment, where I have to use the ignore warnings, but when I use bowtie2 I have no problems, may be it is worth trying 

ADD COMMENT
0
Entering edit mode
6.0 years ago

I encountered the same errors. I thought mistakes happened in the combination of forward and reverse sam files(generated by bwa aln). Then I used

bwa mem ref.fa my_F.fq my_R.fq > my.sam 

to get a complete sam file.

samtools view -bS my.sam -o my.bam

to convert sam file to bam format. Picard worked successfully with my.bam file.

ADD COMMENT

Login before adding your answer.

Traffic: 2137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6