Replacing BAM reference
2
0
Entering edit mode
4.2 years ago
vctrm67 ▴ 50

I know this has been posted about before, but I plan on using samtools reheader to replace the chromosome notation in my bam file (from b37 to hg19). However, between b37 and hg19 headers the number of chromosome sequences is different and they are in different orders. Can I:

  1. Delete all the sequences in the b37 BAM file header
  2. Add all the sequences from the hg19 BAM file into the edited b37 BAM file

Will this work? Or does the order of the chromosome number and such need to be preserved in between headers?

bam • 1.0k views
ADD COMMENT
1
Entering edit mode

the number of chromosome sequences is different.

If there is any difference in the sequence content (ignoring headers) between reference sequences, there is zero guarantee that reads would align exactly the same way (at the same positions with the same scores). There is a good chance that new bam file you create will not be valid.

It is better to extract the reads from the BAM and realign them to the new reference.

ADD REPLY
1
Entering edit mode
4.2 years ago
colin.kern ★ 1.1k

No, that won't work. It's not enough to just reheader because each alignment in the BAM file refers to a chromosome name defined in the headers, so if you're going to change the names in the headers, you need to go through each alignment and change the chromosome name there too. You'll need to know which chromsome name in hg19 matches the one in b37 that refers to the same chromosome. If there's a different number of chromosomes, that means you're missing some (probably unplaced or unlocalized scaffolds). You'll need to drop any alignments that are on a chromosome you don't have a name for in hg19. And this is all assuming that the chromosome sequence is exactly the same between the two references. This page indicates that they're not the same because b37 masks certain low confidence nucleotides. If this doesn't change the size of each chromosome sequences, then your alignment positions will still be valid.

As @RamRS said, it's safer, and probably easier depending on your skills in writing custom scripts, to just realign to hg19.

ADD COMMENT
2
Entering edit mode

When you eyeball a bam by converting it to sam, the chromosome name is printed out, but I don't think this is the case in bam format; I think each line in the bam points up to a specific line of the header for the chromosome name, so changing the headers alone would do the trick...but it still sounds like a bad idea in this case.

ADD REPLY
0
Entering edit mode
4.2 years ago

Don't do this; if only to have an accurate record of what you actually did. You aligned to one version, leave it at that. Or, if you want to be aligned to another version...align to that version. Don't play games with editing the files.

ADD COMMENT

Login before adding your answer.

Traffic: 2569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6