Question: Markduplicates Error
gravatar for Leandro Batista
8.8 years ago by
Leandro Batista100 wrote:


I'm processing whole genome BAM files. Since I'm specifically interested on chromosome 11, I have split my files and I'm working only on this chromosome. However, when I tried to run MarkDuplicates on chr11 bam files, it gave the following error:

SAM validation error: WARNING: Record 23, Read name IL21_1665:3:25:467:1485, Paired read should be marked as first of pair or second of pair.

Running it on ValidateSamFile produced hundreds of warnings with the same information. It also occurred with other Picard tools, such as FixMateInformation. At first, I thought the problem should be related to inter chromosomal pairs, where the information for one of the reads is not present on my bam file. Then, I saw this answered on Picard's FAQ page:

"If your reads have been divided into separate BAMs by chromosome, inter-chromosomal pairs will not be identified, but MarkDuplicates will not fail due to inability to find the mate pair for a read."

Right now, I'm confused and I don't know how to solve this. Should I run MarkDuplicates on the whole-genome file?


ADD COMMENTlink modified 7.6 years ago by Biostar ♦♦ 20 • written 8.8 years ago by Leandro Batista100
gravatar for Sean Davis
8.8 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

Your best bet is to set VALIDATION_STRINGENCY to either SILENT or LENIENT. This will not likely affect the correctness of the results.

ADD COMMENTlink written 8.8 years ago by Sean Davis26k
gravatar for Swbarnes2
8.8 years ago by
Swbarnes21.5k wrote:

It looks like the software that made your .bam made the flags wrong. It looks like Picard is complainnig that your .bam entries have the 1 flagged, but not 64 or 128.

But as long as your read names are identical between the two reads, Picard might still be able to figure out that they are paired, and will still know which pairs have the same coordiantes.

And yes, you can run Picard with VALIDATION_STRINGENCY set to LENIENT, and it will likely do its thing despite that problem.

ADD COMMENTlink written 8.8 years ago by Swbarnes21.5k

Thanks for your answers. I think it worked when setting VALIDATION_STRINGENCY to LENIENT. Anyway I don't understand why I'm having this problem. I'm using Bam files aligned and produced by Sanger Institute and I'm processing them on my own pipeline to call SNPs. Steps leading to Bam file production should be OK, right? But these Bams don't even pass the ValidateSamFile!

ADD REPLYlink written 8.8 years ago by Leandro Batista100

Look at a little of the orignal .sam file with your eyeballs. What flags do you see? Is your pipeline possibly changing those flags?

ADD REPLYlink written 8.8 years ago by Swbarnes21.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1556 users visited in the last hour