Do read names need to be unique over multiple readgroups in a BAM?
After merging multiple bam files that contain paired reads Picard ValidateSam complains about all kinds of read pair errors.
These errors seem to be caused by picard not using the readgroup name in combination with the read name, and so finding pairs that are not valid, because they are not really pairs. (forward and reverse are from a different read group).
Is this an error in Picard that it doesn't use the readgroup information for validating pairs or is it part of the bam specification that read names should be unique over multiple read groups?
" RGID plus read name and it will always be unique for every read": even if the aligner produces more than one hit per pair (sam flag=256) ?
I meant it will be always unique at the read level and not the alignment level. If a read has more than one alignment in the bam file, then all the rows for that read will have the same RGID + read name. I may be wrong too. I think you would be a better person to answer this.
My guess also is that Picard should be right. The problem probably is that our forward and reverse read read are not in the same read group. We are fixing that at the moment and I guessing Picard should then validate the bam.