1
1
Entering edit mode
2.7 years ago
Huynh Nguyen ▴ 10

Dear all,

I have 100 .bam files and I would like to change all their header with a new header. The new header is the same for all files. The only one thing different is in @RG: ID="bam file name" and SM="bam file name". How can I do this step instead of reheadering one by one?

Thank you all for any help.

0
Entering edit mode

Hello,

are there already ReadGroup Information in the header (samtools view -H input.bam|grep "@RG")? If so, should they get replaced?

Why do you want the filename as the ID and SampleName?

fin swimmer

0
Entering edit mode

Can you do something along the lines

samtools view -H in.bam | awk 'BEGIN { FS = OFS = "\t"; } {if
($1 == "@SQ") { gsub("SN:", "SN:chr",$2); print $1,$2, 3; } else print; }' | samtools reheader - in.bam > out.bam  where in this case awk is used to replace chromosome names from 1, 2, ... notation to chr1, chr2, ... notation. Cheers, Thomas ADD REPLY 0 Entering edit mode ADD REPLY 2 Entering edit mode 2.7 years ago Sounds like what you need is to replace a readgroup where you overwrite all alignments with a new readgroup: samtools addreplacerg  prints: Usage: samtools addreplacerg [options] [-r <@RG line> | -R <existing id>] [-o <output.bam>] <input.bam> Options: -m MODE Set the mode of operation from one of overwrite_all, orphan_only [overwrite_all] -o FILE Where to write output to [stdout] -r STRING @RG line text -R STRING ID of @RG line in existing header to use --input-fmt FORMAT[,OPT[=VAL]]... Specify input format (SAM, BAM, CRAM) --input-fmt-option OPT[=VAL] Specify a single input file format option in the form of OPTION or OPTION=VALUE -O, --output-fmt FORMAT[,OPT[=VAL]]... Specify output format (SAM, BAM, CRAM) --output-fmt-option OPT[=VAL] Specify a single output file format option in the form of OPTION or OPTION=VALUE --reference FILE Reference sequence FASTA FILE [null] -@, --threads INT Number of additional threads to use [0]  now if you really don't want to process the entire BAM file (and note that any text-based editing means turning into SAM then back to BAM and would probably be slower than addreplacerg) you could edit the BAM file directly, though with that you could easily corrupt the files if done incorrectly. Here is how a BAM format starts: magic BAM magic string char[4] l_text Length of the header text, including any NUL padding int32 text Plain header text in SAM; not necessarily NUL-terminated char  now edit the l_text and text while shifting the file as needed. you are probably better off with addreplacerg ADD COMMENT 0 Entering edit mode I ended up using addreplacerg, and wrapped it in a fun loop.. for file in PATH/TO/INPUTS/*.bam; do base_name=(basename $file .bam); samtools addreplacerg -r "ID:${base_name}\tSM:${base_name}" -o PATH/TO/OUTPUT/${base_name}.bam \$file;
done