how to change bam file chromosome names using sed or awk?
2
0
Entering edit mode
3.7 years ago
qianxu0517 • 0

I have several bam files with illegal chromosome names, eg: ENSMUST00000193812.1_4933401J01Rik

I wanna visualize these bam files in IGV and those illegal names are causing problems.

I know that I can edit the reference .fasta file header and do mapping again, But I really wanna know how to directly edit the bam file using awk or sed command. I see similar post here: Bam File: Change Chromosome Notation However it is quite difficult for me to understand and apply it to my own situation.

What I wanna do is just replace all the dot in the bam file chromosome name using lower dash.

Thanks in advance for any help!

bam awk • 1.5k views
ADD COMMENT
0
Entering edit mode

ENSMUST00000193812.1_4933401J01Rik

how do you know it's 'illegal' ? why is it causing any problem ?

ADD REPLY
0
Entering edit mode

IGV cannot read this bam files, please see this issue that I posted: https://github.com/igvteam/igv/issues/832#issuecomment-665877615

ADD REPLY
2
Entering edit mode
3.7 years ago
Ram 43k

You should use samtools reheader. Do not do this with sed or awk - you could compromise the file.

ADD COMMENT
0
Entering edit mode

Thanks! I think I need to rename the fast file and then do mapping agian.

ADD REPLY
0
Entering edit mode
3.7 years ago

bam files are compressed. You'd have to feed in the uncomprassed version into sed or awk. Easiest way to do that with with samtools, in which case, you might as well use samtools to do it by itself.

ADD COMMENT

Login before adding your answer.

Traffic: 2763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6