Hello,
I just noticed something while constructing bai
files using picard tools' BuildBamIndex
. The command I normally use is:
java -jar -Xmx4g BuildBamIndex.jar I=foo.bam O=foo.bam.bai
I did the same for a bam file that I mapped using bwa 0.5.9
against a recently sequenced library and obtained this error:
SAM validation error: ERROR: Record 48168030, Read name FOO, MAPQ should be 0 for unmapped read
From seqanswers and biostars forums, I found that this could happen while using BWA and that one should use VALIDATION_STRINGENCY=LENIENT
with picard. Picard threw a warning this time ignoring those reads to successfully create bai files. What concerns me is the size of the index files.
In my bam files sequenced before, the bam files were around 4.5GB
and the bai files were about 370KB
. However, in this library, the bam files are around 5.5GB
and the bai files are about 206KB
. Both of the data sets come from Arabidopsis thaliana, though the experiments are different. I should mention that both of them were run using picard-1.65 with same parameters and also with COMPRESSION_LEVEL=5 default. Within the files mapped against bwa from the recently sequenced library, however, I get bai files around the same size. I am wondering if this has anything to do with the error/warning. Is this something to worry about? How can I verify if this index file is right??
Thank you,
Arun.
Thanks Pierre, but in my case, its not the BAM file that I am concerned about, rather the BAI file (370KB on a 4.5GB BAM file, whereas 206KB on a 5.5GB BAM file). The index file is too small for the size of BAM file and was wondering if this is something to be concerned about.
I think I understand the reason for the difference. Correct me if I am wrong. Picard tools ignores reads that are actually unmapped (looking at the flag) but is reported to have a mapping quality. This means that all these reads are ignored in creating an index file. So, there must be quite a few reads that are not mapped from flag info.