Question: Bam header editing
gravatar for Huynh Nguyen
9 days ago by
Huynh Nguyen0 wrote:

Dear all,

I have 100 .bam files and I would like to change all their header with a new header. The new header is the same for all files. The only one thing different is in @RG: ID="bam file name" and SM="bam file name". How can I do this step instead of reheadering one by one?

Thank you all for any help.

bam header • 101 views
ADD COMMENTlink modified 7 days ago by Istvan Albert ♦♦ 79k • written 9 days ago by Huynh Nguyen0


are there already ReadGroup Information in the header (samtools view -H input.bam|grep "@RG")? If so, should they get replaced?

Why do you want the filename as the ID and SampleName?

fin swimmer

ADD REPLYlink written 8 days ago by finswimmer11k

Can you do something along the lines

samtools view -H in.bam | awk 'BEGIN { FS = OFS = "\t"; } {if 
($1 == "@SQ") { gsub("SN:", "SN:chr", $2); print $1, $2, $3; }
else print; }' | samtools reheader - in.bam > out.bam

where in this case awk is used to replace chromosome names from 1, 2, ... notation to chr1, chr2, ... notation.

Cheers, Thomas

ADD REPLYlink modified 8 days ago • written 8 days ago by t.kuilman750

you can use picard replace sam header.

ADD REPLYlink written 7 days ago by cpad011211k
gravatar for Istvan Albert
7 days ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

Sounds like what you need is to replace a readgroup where you overwrite all alignments with a new readgroup:

samtools addreplacerg


Usage: samtools addreplacerg [options] [-r <@RG line> | -R <existing id>] [-o <output.bam>] <input.bam>

  -m MODE   Set the mode of operation from one of overwrite_all, orphan_only [overwrite_all]
  -o FILE   Where to write output to [stdout]
  -r STRING @RG line text
  -R STRING ID of @RG line in existing header to use
      --input-fmt FORMAT[,OPT[=VAL]]...
               Specify input format (SAM, BAM, CRAM)
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]

now if you really don't want to process the entire BAM file (and note that any text-based editing means turning into SAM then back to BAM and would probably be slower than addreplacerg) you could edit the BAM file directly, though with that you could easily corrupt the files if done incorrectly. Here is how a BAM format starts:

magic   BAM magic   string char[4] 
l_text    Length of the header text, including any NUL padding int32 
text      Plain header text in SAM; not necessarily NUL-terminated char

now edit the l_text and text while shifting the file as needed.

you are probably better off with addreplacerg

ADD COMMENTlink written 7 days ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1176 users visited in the last hour