BAM File, Change chromosome notation
1
0
Entering edit mode
16 months ago
nrk_02 • 0

Hello! :)

I have a BAM file that I need to modify.

1) convert chr1 -> 1 , chr2 -> 2 .....

2) delete the non-standard chromosomes (chrUn_gl..., chr_1_.._random etc.)

I am not sure how should I approach it and in which order, when I apply this:

samtools view test.bam | sed '/chrM/d;/random/d;/chrUn/d;/chr[0-9]*_/d'


It seems like I lose the header information.

So, should I first edit the header like:

samtools view -H test.bam | sed '/chrM/d;/random/d;/chrUn/d;/chr[0-9]*_/d'


and then remove the reads with non-standard chromosomes?

samtools chromosome header BAM notation • 1.2k views
1
Entering edit mode

java -jar dist/bamrenamechr.jar -f mapping.tsv in.bam

0
Entering edit mode

Thank you very much for your reply! However, I still do not understand the logic of the whole thing. I learned that I can subset the bam file by using a bed file, in order to get rid of the non-standard chromosome, but then when I want to update the header, I both delete non-standard chr and convert chr1->1

samtools view -H test.bam | sed 's/chr//' | sed '/M/d;/random/d;/Un/d;/[0-9]*_/d' ,

but It breaks the content of the BAM file when I use rehader: .

samtools view -H test.bam | sed 's/chr//' | sed '/M/d;/random/d;/Un/d;/[0-9]*_/d' | samtools reheader - test.bam > test_reheader.bam


I do not really get how the order of editing should be.

0
Entering edit mode

I strongly recommend trying it on a tiny BAM file (e.g. with 2 reads).

Be aware that deleting every header with the letter M or any underscore (since [0-9]* happily matches nothing) is likely to delete more headers than you intended to.

Note that since BAM (not SAM) uses reference contig index, not reference contig name, samtools reheader gives incorrect results if you drop any @SQ headers that aren't at the end of the @SQ block. For example:

@SQ chr1
@SQ chr2
@SQ chrUn_decoy


to

@SQ 1
@SQ 2


is ok.

Both

@SQ chr1
@SQ chrUn_decoy
@SQ chr2


to

@SQ 1
@SQ 2


doesn't work at all with reheader because it's equivalent to dropping chr2 and renamed chrUn_decoy to chr2. It does not rename chr2 to 2.

Edit: when I tried your command on one of my bams I got:

[W::sam_hdr_link_pg] PG line with PN:samtools has a PP link to missing program 'bwa'


This was a result of your sed deleting one of the @PG headers.

0
Entering edit mode
16 months ago
d-cameron ★ 2.8k

1) Renaming contigs

If the order isn't changing you can use samtools reheader (http://www.htslib.org/doc/samtools-reheader.html). This works because BAM files don't actually store the contig names, but the contig indices.

Picard ReplaceSamHeader should do a similar thing.

Caveat: I'm not aware of any tool that correctly updates SAM tags such as the SA when renaming contigs. Your split reads will be broken.

WARNING: samtools reheader only works if the contigs you are removing are the final contigs. Internally, BAM uses the index (not the name) of the contig each read aligns to. Changing [chr1, chr1_Un,chr2] to [1,2] deletes "chr2" and remaps all "chr1_Un" reads to "2". It does not rename "chr2" aligned reads to "2".

2) Delete additional contigs

It's unclear what you want to do with reads that are mapped to contigs to delete. Do you want to:

• Remove them from the file:

samtools view -L to_retain.bed should do the trick.

Note that this will also filter out unmapped reads. If you want to keep unmapped reads you need to flip it and use -U: samtools view -L to_remove.bed -U output.bam > /dev/null

• Treat them as unmapped

You'll need to set the unmapped SAM flag and remove the contig alignment.

Same caveat: I'm not aware of any tool that correctly updates SAM tags such as the SA when renaming contigs. Your split reads will be broken.

TLDR: You're probably better off realigning to the new reference.