Bam Header Edit
2
9
Entering edit mode
9.5 years ago

Hi,

I have around 20 BAM files and each of them have proper bam header including RG ID, Sample, Library, Platform unit, Description and Platform. I was reading the GATK manual and just realized that the only accepted names for platforms are 454, LS454, Illumina, Solid, ABI_Solid, and CG. But I wanted to be quiet specific when i was converting SAM to BAM and used "Solid5500XL" as the platform name.

I know there are AddorReplaceReadgroups and ReplaceSamHeader coomand in Picard which i can use to edit the BAM header. But it will take some time as I have many large files. Is there an easy way to do so without rewriting the BAM files.

Thanks in advance.

bam • 48k views
ADD COMMENT
0
Entering edit mode

Thanks All. Somebody on the other forum told me that there is no way you can quickly edit minor info. But samtools reheader option incorporates the change you want to do in BAM file very quickly. It is same as the time required to copy your bam file from one place to another.

ADD REPLY
0
Entering edit mode

Yes, this is the JC's answer.

ADD REPLY
21
Entering edit mode
9.5 years ago
JC 12k

I think you can do this with samtools, first extract the header into a text file, then correct the value and then change the BAM header. Something like this:

for BAM in *.bam
do
     samtools view -H $BAM > header.sam
     sed "s/Solid5500XL/Solid/" header.sam > header_corrected.sam
     samtools reheader  header_corrected.sam $BAM
done

I haven't tested it but I think it must work for you. Please let me know if that works (who knows, maybe some day I'll need it).

ADD COMMENT
9
Entering edit mode

yes, I agree this should work. and considering that samtools allows piping, I would rewrite the 3 lines inside the DO loop to get rid of intermediate header.sam files simply as follows:

samtools view -H $BAM | sed "s/Solid5500XL/Solid/" | samtools reheader - $BAM
ADD REPLY
0
Entering edit mode

really clever, avoiding the temporal files, like it

ADD REPLY
1
Entering edit mode

I just wanted to add that i was getting a segmentation fault (crash) using samtools reheader in a similar way but it seems to be fixed when I upgraded samtools to version 0.1.19

ADD REPLY
0
Entering edit mode

I got this knowledge directly from a Jeremy Leipzig's answer quite a long ago, which actually opened my mind at that time when I was starting to extensively work with bam files: A: How Can I Edit Some Rows In .Bam Header File?

ADD REPLY
0
Entering edit mode

What if I want edit the header @RG of my sample in the bam files I merged, will it work safely? Because I have finished recalibration two times but in wrong header, I can't present multiple samples in a vcf file.

ADD REPLY
5
Entering edit mode

I agree, the method worked fine for me, but I had to redirect the output of the samtools/reheader command to a newly created bam file, otherwise the file with the correct header is printed to the screen and then the remote terminal crashed:

samtools reheader header_corrected.sam header.bad.bam > header.ok.bam
ADD REPLY
1
Entering edit mode

Yes .. With my 100 Go bam file I have to duplicate the file and this long ! And i have to change one single bit ...

ADD REPLY
1
Entering edit mode

Doesn't samtools reheader output the corrected bam to STDOUT, so you'll need to redirect the output to a file? i.e. the last line there should be something like:

samtools reheader header.sam $BAM > ${BAM%.bam}.reheadered.bam
ADD REPLY
0
Entering edit mode

Please forgive me if I am wrong, but I don't think this answers the original question. samtools reheader will simply rewrite the BAM file after first writing the header. This means copying each BAM file all the way through. For large BAM files (it is not uncommon for me to work with 200GB files) this is unfeasible.  

ADD REPLY
0
Entering edit mode

It is simply impossible to edit the BAM header only. Reheader is the best shot. It is by far faster than other alternatives.

ADD REPLY
0
Entering edit mode

Thanks for this. Do you know if the original .bai will work on the new BAM (specifically I'm changing the sequence names, so I guess not...).

ADD REPLY
1
Entering edit mode
9.5 years ago

It shouldn't take too long to modify all your headers in the SAM version and then convert them to BAM:

for i in `ls directory/*.sam` ; do
     cp $i $i.backup # to be on the safe side
     sed -i 's/Solid5500XL/Solid/' $i
done

After this, GATK should work like a charm without complaints.

ADD COMMENT
0
Entering edit mode

The BAM is a compressed file (binary), I don't think you can change the header with a reg-ex.

ADD REPLY
0
Entering edit mode

I was thinking of SAM files. I still think the best solution here is to modify the SAM files and redo the BAM conversion, because the allowed platform (PL) values seem to be hard-coded in GATK.

ADD REPLY
0
Entering edit mode

yes, that's a valid option, but decoding/encoding the BAM files can take a looooong time.

ADD REPLY
0
Entering edit mode

You could just edit your answer accordingly then. Of course it was obvious that you were thinking of a SAM file.

ADD REPLY
0
Entering edit mode

Done, thanks :-)

ADD REPLY

Login before adding your answer.

Traffic: 1514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6