Question: Bam And Indexed Bam Files
15
gravatar for Sahel
5.2 years ago by
Sahel150
Sahel150 wrote:

Hi There,

I recently started grad-school and have no background working with sequencing data :( As the first thing to do my supervisor asked me to look at these two files "XXX.bam" and "XXX.bam.bai" and figure out if they are the same files (just one indexed) or they are different. I figured out that since these files have exact same name, but one with additional ".bai" at the end, it looks XXX.bam.bai is the index form of the XXX.bam, but I am not completely sure. Can some one please give me a hint how to make sure if the files are the same or not? What program I can use to generate indexed bam? (SAM?! I just heard about it, never had a chance to work with yet) and by what program I can visualize them?

Thank you so much....

Sahel

bam index • 38k views
ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Sahel150
39
gravatar for Chris Miller
5.2 years ago by
Chris Miller17k
Washington University in St. Louis, MO
Chris Miller17k wrote:

A bai file isn't an indexed form of a bam - it's a companion to your bam that contains the index.

A bam file is a binary blob that stores all of your aligned sequence data. You can view what's in the bam file using "samtools view bamfile.bam | less".

Bam files can also have a companion file, called an index file. This file has the same name, suffixed with .bai. This file acts like an external table of contents, and allows programs to jump directly to specific parts of the bam file without reading through all of the sequences. Without the corresponding bam file, your bai file is useless, since it doesn't actually contain any sequence data.

If you have a bam file without a corresponding index, you can generate one using "samtools index bamfile.bam".

If your index file is named identically, with just the additional ".bai" suffix, you can be reasonably sure that it was generated from the same file. If you have any doubt, though, it's easy enough to delete your bai file, then generate a new index using the previous command. Keep in mind that this may take a half hour or more depending on the size of your bam and the speed of your computer.

ADD COMMENTlink written 5.2 years ago by Chris Miller17k
5
gravatar for Oligo
5.2 years ago by
Oligo50
Oligo50 wrote:

The suffix bai is indeed the index file of the bam file. One way to create an index for a bam file is with the [?] Samtools[?] index command.

ADD COMMENTlink written 5.2 years ago by Oligo50
2

moreover, there's no need to compare anything. if you aren't sure if a bai file corresponds to a particular bam file, just delete it and generate a new one as suggested.

ADD REPLYlink written 5.2 years ago by Jorge Amigo9.5k

Hi Oligo,

Thanks for the quick reply. So you think if I generate a new indexed bam and compare it to the original one, I can figure out if they both have been made from the same .bam file? May I ask what software do you suggest for comparing two indexed bam files?

Thanks again, really appreciate your help.

ADD REPLYlink written 5.2 years ago by Sahel150

The answer is yes. The simplest way is to move the index file into another directory and create a new index (with: samtools index BAM_FILE_NAME). Then you can compare the md5 checksums of the files: md5sum ORIGINAL_FILE md5sum NEW_INDEX_FILE In case the output of both commands is identical, these are the same index files.

ADD REPLYlink written 5.2 years ago by Oligo50
5
gravatar for Jorge Amigo
5.2 years ago by
Jorge Amigo9.5k
Santiago de Compostela, Spain
Jorge Amigo9.5k wrote:

I like UCSC's succinct description of a BAM file: a compact and index-able representation of nucleotide sequence alignments. although a standalone BAM file can be useful, a particular advantage of this format is its design for having the data binary compressed and easily indexed, so that navigating through it without the need of loading all the file into memory is possible. a BAM file is just the binary translation of a SAM file, this one being human readable, so aside from their nature (binary or not binary) both files are equivalent.

you may get the most appropriate SAM readings obviously from the SAMtools webpage, although I would also recomend looking at UCSC's BAM format webpage to get a nice description of both formats and their relationship.

ADD COMMENTlink written 5.2 years ago by Jorge Amigo9.5k
0
gravatar for Sahel
5.2 years ago by
Sahel150
Sahel150 wrote:

Thanks all. Your answers helped me a lot... I made a new index and everything looks fine :) I really appreciate all your help... :)

ADD COMMENTlink written 5.2 years ago by Sahel150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 516 users visited in the last hour