I'm processing whole genome BAM files. Since I'm specifically interested on chromosome 11, I have split my files and I'm working only on this chromosome. However, when I tried to run MarkDuplicates on chr11 bam files, it gave the following error:
SAM validation error: WARNING: Record 23, Read name IL21_1665:3:25:467:1485, Paired read should be marked as first of pair or second of pair.
Running it on ValidateSamFile produced hundreds of warnings with the same information. It also occurred with other Picard tools, such as FixMateInformation. At first, I thought the problem should be related to inter chromosomal pairs, where the information for one of the reads is not present on my bam file. Then, I saw this answered on Picard's FAQ page:
"If your reads have been divided into separate BAMs by chromosome, inter-chromosomal pairs will not be identified, but MarkDuplicates will not fail due to inability to find the mate pair for a read."
Right now, I'm confused and I don't know how to solve this. Should I run MarkDuplicates on the whole-genome file?
Thanks for your answers. I think it worked when setting VALIDATION_STRINGENCY to LENIENT. Anyway I don't understand why I'm having this problem. I'm using Bam files aligned and produced by Sanger Institute and I'm processing them on my own pipeline to call SNPs. Steps leading to Bam file production should be OK, right? But these Bams don't even pass the ValidateSamFile!
Look at a little of the orignal .sam file with your eyeballs. What flags do you see? Is your pipeline possibly changing those flags?