Should Read Names Be Unique Over Multiple Readgroups In A Bam?
1
0
Entering edit mode
11.0 years ago
William ★ 5.3k

Do read names need to be unique over multiple readgroups in a BAM?

After merging multiple bam files that contain paired reads Picard ValidateSam complains about all kinds of read pair errors.

These errors seem to be caused by picard not using the readgroup name in combination with the read name, and so finding pairs that are not valid, because they are not really pairs. (forward and reverse are from a different read group).

Is this an error in Picard that it doesn't use the readgroup information for validating pairs or is it part of the bam specification that read names should be unique over multiple read groups?

bam picard • 2.7k views
ADD COMMENT
1
Entering edit mode
11.0 years ago

I am not sure about your problem with Picard but read names are not supossed to be unique in a merged bam file if they have a unique read group id or RGID tag. In other words, two reads with the same names coming from different runs can be merged into one file if you have unique RGIDs for them. Most of the softwares will check for uniqueness of RGID plus read name and it will always be unique for every read. Earlier people used to concatenate RGID and read name to create a unique read names in a bam file. I am sure Picard is one of the smartest NGS tool we have and it should be able to differentiate reads with the same names but different RGIDs.

PS: By RGIDs, I mean RG: tags in BAM file.

ADD COMMENT
0
Entering edit mode

" RGID plus read name and it will always be unique for every read": even if the aligner produces more than one hit per pair (sam flag=256) ?

ADD REPLY
0
Entering edit mode

I meant it will be always unique at the read level and not the alignment level. If a read has more than one alignment in the bam file, then all the rows for that read will have the same RGID + read name. I may be wrong too. I think you would be a better person to answer this.

ADD REPLY
0
Entering edit mode

My guess also is that Picard should be right. The problem probably is that our forward and reverse read read are not in the same read group. We are fixing that at the moment and I guessing Picard should then validate the bam.

ADD REPLY

Login before adding your answer.

Traffic: 2102 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6