Need Help In Understanding A Bam File
2
0
Entering edit mode
12.3 years ago
User 9906 ▴ 50

Hi,

After reading http://samtools.sourceforge.net/SAM1.pdf , I have learnt that BAI files are indexes to BAM files. Also understood the bgzf and virtual file offset concept.

Example BAM file printed :

@SQ    SN:CHR21    LN:1000   
@SQ    SN:CHR22    LN:2000  
read100    16    CHR21    33028084    255    50M    *    0    0    ATTTAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATT    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<    NM:i:0
read101    16    CHR21    33028087    255    50M    *    0    0    TAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATTATC    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<    NM:i:0
...many CHR21 segments
read200    16    CHR22    33028084    255    50M    *    0    0    ATTTAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATT    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<    NM:i:0
read201    16    CHR22    33028084    255    50M    *    0    0    ATTTAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATT    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<    NM:i:0
.. many CHR22 segments.

I am having trouble understanding what the entries in BAI files point to. The indexes in BAI file point to what? Can somebody please explain with respect to a specific file (or use the example above)? What does the first index point to? What does the second index point to? and so on.

If there is a document that I should read, please point me to it.

Thanks in advance.

bam sam • 3.3k views
ADD COMMENT
1
Entering edit mode
12.3 years ago
Gjain 5.8k

this previous post on biostar should help you.

ADD COMMENT
4
Entering edit mode

The "specific parts" are the parts that the user is asking for. For example, you ask for all reads on chr22 between positions 1000000 and 2000000. The bai file will tell you (roughly) where (think "byte offsets) inside the bam file these reads will be found, instead of "you" having to start from the beginning of the bam file and parse every read until you get to chr22 position 1 million. As for what an index is: ask yourself what the purpose of a book index is and you're close. A "database index" is also close.

ADD REPLY
0
Entering edit mode

No, it doesnt. It says that bai "allows programs to jump directly to specific parts of the bam file without reading through all of the sequences" . But to which parts? What is an index?

ADD REPLY
1
Entering edit mode
12.3 years ago

If you want the hairy details of how bam indexing works, Section 4 of the SAM specification document is what you're looking for.

If all you're looking for is enough understanding to work with the bam files, it's sufficient to understand that it works in a way that is similar to a table of contents in a book and allows accessors to grab the appropriate sequences much more quickly.

ADD COMMENT

Login before adding your answer.

Traffic: 2399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6