Question: Bam Header Edit
7
gravatar for Ashutosh Pandey
6.4 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Hi,

I have around 20 BAM files and each of them have proper bam header including RG ID, Sample, Library, Platform unit, Description and Platform. I was reading the GATK manual and just realized that the only accepted names for platforms are 454, LS454, Illumina, Solid, ABI_Solid, and CG. But I wanted to be quiet specific when i was converting SAM to BAM and used "Solid5500XL" as the platform name.

I know there are AddorReplaceReadgroups and ReplaceSamHeader coomand in Picard which i can use to edit the BAM header. But it will take some time as I have many large files. Is there an easy way to do so without rewriting the BAM files.

Thanks in advance.

bam • 29k views
ADD COMMENTlink written 6.4 years ago by Ashutosh Pandey11k
18
gravatar for JC
6.4 years ago by
JC6.8k
Mexico
JC6.8k wrote:

I think you can do this with samtools, first extract the header into a text file, then correct the value and then change the BAM header. Something like this:

for BAM in *.bam
do
     samtools view -H $BAM > header.sam
     sed "s/Solid5500XL/Solid/" header.sam > header_corrected.sam
     samtools reheader  header_corrected.sam $BAM
done

I haven't tested it but I think it must work for you. Please let me know if that works (who knows, maybe some day I'll need it).

ADD COMMENTlink written 6.4 years ago by JC6.8k
9

yes, I agree this should work. and considering that samtools allows piping, I would rewrite the 3 lines inside the DO loop to get rid of intermediate header.sam files simply as follows:

samtools view -H $BAM | sed "s/Solid5500XL/Solid/" | samtools reheader - $BAM
ADD REPLYlink modified 6.4 years ago by Istvan Albert ♦♦ 77k • written 6.4 years ago by Jorge Amigo11k

really clever, avoiding the temporal files, like it

ADD REPLYlink written 6.4 years ago by JC6.8k
1

I just wanted to add that i was getting a segmentation fault (crash) using samtools reheader in a similar way but it seems to be fixed when I upgraded samtools to version 0.1.19

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by cmdcolin900

I got this knowledge directly from a Jeremy Leipzig's answer quite a long ago, which actually opened my mind at that time when I was starting to extensively work with bam files: http://www.biostars.org/post/show/8988/how-can-i-edit-some-rows-in-bam-header-file/#8990

ADD REPLYlink written 6.4 years ago by Jorge Amigo11k

What if I want edit the header @RG of my sample in the bam files I merged, will it work safely? Because I have finished recalibration two times but in wrong header, I can't present multiple samples in a vcf file.

ADD REPLYlink written 3 months ago by mike229lin10
5

I agree, the method worked fine for me, but I had to redirect the output of the samtools/reheader command to a newly created bam file, otherwise the file with the correct header is printed to the screen and then the remote terminal crashed:

samtools reheader header_corrected.sam header.bad.bam > header.ok.bam

ADD REPLYlink written 4.8 years ago by ff.cc.cc1.2k
1

Yes .. With my 100 Go bam file I have to duplicate the file and this long ! And i have to change one single bit ...

ADD REPLYlink written 4.6 years ago by raphael.poujol30
1

Doesn't samtools reheader output the corrected bam to STDOUT, so you'll need to redirect the output to a file? i.e. the last line there should be something like:

samtools reheader header.sam $BAM > ${BAM%.bam}.reheadered.bam
ADD REPLYlink written 16 months ago by alanh10

Please forgive me if I am wrong, but I don't think this answers the original question. samtools reheader will simply rewrite the BAM file after first writing the header. This means copying each BAM file all the way through. For large BAM files (it is not uncommon for me to work with 200GB files) this is unfeasible.  

ADD REPLYlink written 4.0 years ago by jhrf10

It is simply impossible to edit the BAM header only. Reheader is the best shot. It is by far faster than other alternatives.

ADD REPLYlink written 4.0 years ago by lh331k

Thanks for this. Do you know if the original .bai will work on the new BAM (specifically I'm changing the sequence names, so I guess not...).

 

ADD REPLYlink written 2.8 years ago by Dan500
1
gravatar for Leonor Palmeira
6.4 years ago by
Leonor Palmeira3.6k
Liège, Belgium
Leonor Palmeira3.6k wrote:

It shouldn't take too long to modify all your headers in the SAM version and then convert them to BAM:

for i in `ls directory/*.sam` ; do
     cp $i $i.backup # to be on the safe side
     sed -i 's/Solid5500XL/Solid/' $i
done

After this, GATK should work like a charm without complaints.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Leonor Palmeira3.6k

The BAM is a compressed file (binary), I don't think you can change the header with a reg-ex.

ADD REPLYlink written 6.4 years ago by JC6.8k

I was thinking of SAM files. I still think the best solution here is to modify the SAM files and redo the BAM conversion, because the allowed platform (PL) values seem to be hard-coded in GATK.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Leonor Palmeira3.6k

yes, that's a valid option, but decoding/encoding the BAM files can take a looooong time.

ADD REPLYlink written 6.4 years ago by JC6.8k

You could just edit your answer accordingly then. Of course it was obvious that you were thinking of a SAM file.

ADD REPLYlink written 6.4 years ago by Arun2.3k

Done, thanks :-)

ADD REPLYlink written 6.4 years ago by Leonor Palmeira3.6k
0
gravatar for Ashutosh Pandey
6.4 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Thanks All. Somebody on the other forum told me that there is no way you can quickly edit minor info. But samtools reheader option incorporates the change you want to do in BAM file very quickly. It is same as the time required to copy your bam file from one place to another.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Ashutosh Pandey11k

Yes, this is the JC's answer.

ADD REPLYlink written 6.4 years ago by lh331k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1821 users visited in the last hour