2
9
Entering edit mode
10.8 years ago

Hi,

I have around 20 BAM files and each of them have proper bam header including RG ID, Sample, Library, Platform unit, Description and Platform. I was reading the GATK manual and just realized that the only accepted names for platforms are 454, LS454, Illumina, Solid, ABI_Solid, and CG. But I wanted to be quiet specific when I was converting SAM to BAM and used "Solid5500XL" as the platform name.

I know there are AddorReplaceReadgroups and ReplaceSamHeader coomand in Picard which I can use to edit the BAM header. But it will take some time as I have many large files. Is there an easy way to do so without rewriting the BAM files.

bam • 56k views
0
Entering edit mode

Thanks All. Somebody on the other forum told me that there is no way you can quickly edit minor info.

But samtools reheader option incorporates the change you want to do in BAM file very quickly. It is same as the time required to copy your bam file from one place to another.

0
Entering edit mode

26
Entering edit mode
10.8 years ago
JC 13k

I think you can do this with samtools, first extract the header into a text file, then correct the value and then change the BAM header. Something like this:

for BAM in *.bam
do
samtools view -H $BAM > header.sam sed "s/Solid5500XL/Solid/" header.sam > header_corrected.sam samtools reheader header_corrected.sam$BAM
done


I haven't tested it but I think it must work for you. Please let me know if that works (who knows, maybe some day I'll need it).

9
Entering edit mode

yes, I agree this should work. and considering that samtools allows piping, I would rewrite the 3 lines inside the DO loop to get rid of intermediate header.sam files simply as follows:

samtools view -H $BAM | sed "s/Solid5500XL/Solid/" | samtools reheader -$BAM

0
Entering edit mode

really clever, avoiding the temporary files, like it

1
Entering edit mode

I just wanted to add that i was getting a segmentation fault (crash) using samtools reheader in a similar way but it seems to be fixed when I upgraded samtools to version 0.1.19

0
Entering edit mode

I got this knowledge directly from a Jeremy Leipzig's answer quite a long ago, which actually opened my mind at that time when I was starting to extensively work with bam files: A: How Can I Edit Some Rows In .Bam Header File?

0
Entering edit mode

What if I want edit the header @RG of my sample in the bam files I merged, will it work safely? Because I have finished recalibration two times but in wrong header, I can't present multiple samples in a vcf file.

5
Entering edit mode

I agree, the method worked fine for me, but I had to redirect the output of the samtools/reheader command to a newly created bam file, otherwise the file with the correct header is printed to the screen and then the remote terminal crashed:

samtools reheader header_corrected.sam header.bad.bam > header.ok.bam

1
Entering edit mode

Yes .. With my 100 Go bam file I have to duplicate the file and this long ! And i have to change one single bit ...

1
Entering edit mode

Doesn't samtools reheader output the corrected bam to STDOUT, so you'll need to redirect the output to a file?

i.e. the last line there should be something like:

samtools reheader header.sam $BAM >${BAM%.bam}.reheadered.bam

0
Entering edit mode

Please forgive me if I am wrong, but I don't think this answers the original question. samtools reheader will simply rewrite the BAM file after first writing the header. This means copying each BAM file all the way through. For large BAM files (it is not uncommon for me to work with 200GB files) this is unfeasible.

0
Entering edit mode

It is simply impossible to edit the BAM header only. Reheader is the best shot. It is by far faster than other alternatives.

0
Entering edit mode

Thanks for this. Do you know if the original .bai will work on the new BAM (specifically I'm changing the sequence names, so I guess not...).

1
Entering edit mode
10.8 years ago

It shouldn't take too long to modify all your headers in the SAM version and then convert them to BAM:

for i in ls directory/*.sam ; do
cp $i$i.backup # to be on the safe side
sed -i 's/Solid5500XL/Solid/' \$i
done


After this, GATK should work like a charm without complaints.

0
Entering edit mode

The BAM is a compressed file (binary), I don't think you can change the header with a reg-ex.

0
Entering edit mode

I was thinking of SAM files. I still think the best solution here is to modify the SAM files and redo the BAM conversion, because the allowed platform (PL) values seem to be hard-coded in GATK.

0
Entering edit mode

yes, that's a valid option, but decoding/encoding the BAM files can take a looooong time.

0
Entering edit mode

You could just edit your answer accordingly then. Of course it was obvious that you were thinking of a SAM file.

0
Entering edit mode

Done, thanks :-)