Does the absence of BAI file just cause an algorithm to run slower or would it also generate incorrect results? For example, why doesn't RSEM require a BAI file?
2
0
Entering edit mode
11 months ago
' ▴ 290

Pretty much what the title says.

1. What happens if I don't provide a BAI file when it's required? For example, when running RSeQC. The tool still runs, but does it mean it's just slower or my results are wrong too?
2. It looks like rsem-calculate-expression does not require a BAI file. Am I correct? Or it does require it but just not warning me when it's not provided? I'm passing my transcriptome BAM from STAR to it.
bai alignment bam rsem gene-expression • 1.0k views
3
Entering edit mode
11 months ago

The index file is used when random access to the BAM file is required. For example: retrieve alignments overlapping the regions: 1,000,000-1,000,100

If a tool reads the entire BAM file sequentially then the BAI file has no use.

0
Entering edit mode

Very helpful! Thanks a lot. Is there any way at all to confirm whether a tool like RSEM indeed utilizes the BAI file or not? I have gone through the source code but I can't really verify whether they use the BAI file when I run rsem-calculate-expression

1
Entering edit mode
11 months ago

In the vast majority of cases if a tools need the index and the index is not there the tools will fail, hopefully with a meaningful message so you know it.

Some tools may decide to sort and index the bam file to temp files in which case the user will notice a slowdown but the results will be fine.

I believe that even fewer tools will opt for processing the bam without index (in theory, the index is not strictly necessary) and it would probably be very very slow, but again the results should be correct.

Finally, I doubt there is a tool that would process without index when the index is needed, exit without errors, and give wrong results - that would be a serious bug.

0
Entering edit mode

the concept of sorted BAM file and indexed BAM file get somewhat conflated in our minds so I now spent some time disentangling my own thinking as well

only sorted files can get indexed, thus, having an index means the file was sorted ;-) thus the presence of "index" is clear cut evidence that the file was sorted

many tools need sorted files, but only tools that need to jump around the BAM file need to make use of the index

Importantly: if the file is not sorted yet the tool expects a sorted alignment file then the results will likely be incorrect.

If a file is not indexed then all operations can still be performed and would produce correct results only much slower

Aorting is the more important (and time-consuming operation), indexing is much faster. So for the OP, to avoid any second guessing sort and index your file :-)

0
Entering edit mode

That's interesting! So what happens if the tool expects unsorted BAM but we supply a sorted BAM to it?

0
Entering edit mode

Unsorted means any/arbitrary order. Thus a sorted bam is included in the set of possible "unsorted" ones. :-)

Now if you mean that a tool will break if the input is sorted, then it will happen just that ...