Question: Might @PG headers affect downstream analyses, and how can I safely remove them from a BAM file?
gravatar for olikidrod
20 months ago by
olikidrod0 wrote:

I've been mapping BAM files with bwa, and had bwa add read groups during the mapping. As a consequence, the exact command that I used to execute bwa is thus included in the @PG headers in all the BAM files. That obviously includes the read groups specified.

Since then, I've used Picard to replace all of the read groups with new values. As such, the information in the @PG headers is incorrect, and could mislead other researchers if I publish the BAM files.

1) How can I safely remove these @PG headers from the BAM files? I figure I might as well just strip them all out if they contain incorrect data.

2) Is this necessary, assuming I don't publish the BAM files and I'm the only one with potential to be confused? Could @PG headers affect downstream analyses when it comes to variant calling etc.? I don't think GATK uses them at all, but I don't know if other pipelines or programs might incorporate that data.

Thank you!

@pg bam read groups • 744 views
ADD COMMENTlink modified 20 months ago by h.mon31k • written 20 months ago by olikidrod0

I don't think the @PG is used by any downstream program, it is used as metadata, to keep track of how the file was created and modified. Multiple @PG lines are allowed, and it is possible Picard has added one for the operation you performed - did you check?

ADD REPLYlink written 20 months ago by h.mon31k

Many thanks for your quick reply. Yes; I checked, Picard did not add one. I'll strip out the @PG lines to avoid confusing anyone later.

ADD REPLYlink written 20 months ago by olikidrod0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour