Question

Bam Index Files (Bai) And Picard Tools Buildbamindex

1

Entering edit mode

12.0 years ago

Arun 2.4k

Hello,

I just noticed something while constructing bai files using picard tools' BuildBamIndex. The command I normally use is:

java -jar -Xmx4g BuildBamIndex.jar I=foo.bam O=foo.bam.bai

I did the same for a bam file that I mapped using bwa 0.5.9 against a recently sequenced library and obtained this error:

SAM validation error: ERROR: Record 48168030, Read name FOO, MAPQ should be 0 for unmapped read

From seqanswers and biostars forums, I found that this could happen while using BWA and that one should use VALIDATION_STRINGENCY=LENIENT with picard. Picard threw a warning this time ignoring those reads to successfully create bai files. What concerns me is the size of the index files.

In my bam files sequenced before, the bam files were around 4.5GB and the bai files were about 370KB. However, in this library, the bam files are around 5.5GB and the bai files are about 206KB. Both of the data sets come from Arabidopsis thaliana, though the experiments are different. I should mention that both of them were run using picard-1.65 with same parameters and also with COMPRESSION_LEVEL=5 default. Within the files mapped against bwa from the recently sequenced library, however, I get bai files around the same size. I am wondering if this has anything to do with the error/warning. Is this something to worry about? How can I verify if this index file is right??

Thank you,
Arun.

picard bwa error • 10k views

ADD COMMENT • link updated 12.0 years ago by Pierre Lindenbaum 161k • written 12.0 years ago by Arun 2.4k

score 0 · Answer 1 · 2012-04-27

0

Entering edit mode

12.0 years ago

Pierre Lindenbaum 161k

I was not able to retrieve it, but I remember I saw on the samtools-mailing list a post saying that a compressed BAM file could be bigger than a SAM file (But the content of the BAM file will be read faster). A simple way to check that the BAM file is valid would be to run some simple commands with samtools (like samtools view my.bam chrX:1000-2000)

ADD COMMENT • link 12.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks Pierre, but in my case, its not the BAM file that I am concerned about, rather the BAI file (370KB on a 4.5GB BAM file, whereas 206KB on a 5.5GB BAM file). The index file is too small for the size of BAM file and was wondering if this is something to be concerned about.

ADD REPLY • link 12.0 years ago by Arun 2.4k

0

Entering edit mode

I think I understand the reason for the difference. Correct me if I am wrong. Picard tools ignores reads that are actually unmapped (looking at the flag) but is reported to have a mapping quality. This means that all these reads are ignored in creating an index file. So, there must be quite a few reads that are not mapped from flag info.

ADD REPLY • link 12.0 years ago by Arun 2.4k