How to utilize the bai index file while parsing a BAM file
1
0
Entering edit mode
21 months ago
matejm469 • 0

This is maybe a noobish question but I'm working on a program that visualizes some data from a SAM/BAM file, it also has an option to lookup certain reads by their names and stuff like that. I've been tasked to compare how much (if any) are some operations faster when performed on an indexed BAM file than on a simple SAM file but I don't know how would I go about utilizing a BAM index in my program. The program is written in java and I've been using htsjdk library for reading SAM/BAM files. I don't expect anyone to write my code for me, but some general pointers to how would I go about doing this would be much appreciated.

Thanks in advance

samtools htsjdk java bam • 1.2k views
ADD COMMENT
0
Entering edit mode

I haven't used htsjdk, but their documentation shows support for BAM index files through their BAM reader API. Someone with more knowledge might come along and provide more info.

ADD REPLY
1
Entering edit mode
21 months ago

The bam index is just useful If you need to quickly get the reads in one or more defined region of the BAM. Eg:

How many reads are mapped on the gene 'X' on chr22

the file reader is quickly moved to the right section in the bam file without reading chr1, chr2, chr3... etc.. see https://javadoc.io/static/com.github.samtools/htsjdk/3.0.0/htsjdk/samtools/SamReader.html#query-htsjdk.samtools.QueryInterval:A-boolean-

For a queston like

How many reads is there in my bam file.

you don't need the BAI, just iterate over the whole bam. see: https://javadoc.io/static/com.github.samtools/htsjdk/3.0.0/htsjdk/samtools/SamReader.html#iterator--

To create a SamReader: https://javadoc.io/static/com.github.samtools/htsjdk/3.0.0/htsjdk/samtools/SamReaderFactory.html

edit: in htsjdk, you often don't need to specifify where is the bai file. It is automatically detected from the bam file.

ADD COMMENT
0
Entering edit mode

For "How many reads is there in my bam file." note that the BAI index does tell you this info, see samtools idxstats ("index stats") http://www.htslib.org/doc/samtools-idxstats.html

ADD REPLY
0
Entering edit mode

note that the BAI index does tell you this info, see samtools idxstats

how course it does :-) and samtools will always be faster for most simple jobs like this.

my example was only here to illustrate how to scan a bam using java.

ADD REPLY
0
Entering edit mode

This might be a dumb question, but what exactly do you mean by "defined region of the BAM" ?

ADD REPLY
0
Entering edit mode

a genomic region defined by "chromosome:start-end"

ADD REPLY

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6