Question: Bam Index Files (Bai) And Picard Tools Buildbamindex
gravatar for Arun
6.8 years ago by
Arun2.3k wrote:


I just noticed something while constructing bai files using picard tools' BuildBamIndex. The command I normally use is:

java -jar -Xmx4g BuildBamIndex.jar I=foo.bam O=foo.bam.bai

I did the same for a bam file that I mapped using bwa 0.5.9 against a recently sequenced library and obtained this error:

SAM validation error: ERROR: Record 48168030, Read name FOO, MAPQ should be 0 for unmapped read

From seqanswers and biostars forums, I found that this could happen while using BWA and that one should use VALIDATION_STRINGENCY=LENIENT with picard. Picard threw a warning this time ignoring those reads to successfully create bai files. What concerns me is the size of the index files.

In my bam files sequenced before, the bam files were around 4.5GB and the bai files were about 370KB. However, in this library, the bam files are around 5.5GB and the bai files are about 206KB. Both of the data sets come from Arabidopsis thaliana, though the experiments are different. I should mention that both of them were run using picard-1.65 with same parameters and also with COMPRESSION_LEVEL=5 default. Within the files mapped against bwa from the recently sequenced library, however, I get bai files around the same size. I am wondering if this has anything to do with the error/warning. Is this something to worry about? How can I verify if this index file is right??

Thank you,

picard bwa error • 8.0k views
ADD COMMENTlink modified 6.8 years ago by Pierre Lindenbaum116k • written 6.8 years ago by Arun2.3k
gravatar for Pierre Lindenbaum
6.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

I was not able to retrieve it, but I remember I saw on the samtools-mailing list a post saying that a compressed BAM file could be bigger than a SAM file (But the content of the BAM file will be read faster). A simple way to check that the BAM file is valid would be to run some simple commands with samtools (like samtools view my.bam chrX:1000-2000)

ADD COMMENTlink written 6.8 years ago by Pierre Lindenbaum116k

Thanks Pierre, but in my case, its not the BAM file that I am concerned about, rather the BAI file (370KB on a 4.5GB BAM file, whereas 206KB on a 5.5GB BAM file). The index file is too small for the size of BAM file and was wondering if this is something to be concerned about.

ADD REPLYlink written 6.8 years ago by Arun2.3k

I think I understand the reason for the difference. Correct me if I am wrong. Picard tools ignores reads that are actually unmapped (looking at the flag) but is reported to have a mapping quality. This means that all these reads are ignored in creating an index file. So, there must be quite a few reads that are not mapped from flag info.

ADD REPLYlink written 6.8 years ago by Arun2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2173 users visited in the last hour