Reading section 1.2 of the SAM specification, it says that BAM is 0-based. Yet some folks I know who work with sequencing data in that format say it is 1-based. Who is correct?
The spec says that the SAM format is one based and that the BAM format is zero based.
But this latter only matters if you access the file directly - if you access a BAM file via a tool like samtools that turns BAM into SAM then it will be turned into a 1 based format.
Sneaky! Thanks for the explanation...
So far, i though BAM is 0-based. But when i look at BAM description at IGV
It says BAM is 1-based. Which is confusing.
From ENCODE mapped BAM file, what is the best way to manually confirm if the BAM file is in 0-based or 1-based. For eg: ENCODE CAGE data, is this 0-based or 1-based
# Download Encode BAM
it only needs to be treated as zero based if you write a tool that opens the binary BAM file directly and you acess and extract the field that contains the coordinate itself. For example you use a programming API in python, java or C. Then you need to regard it as 0 based. In any other interpretation the conversion is being done for you, when IGV shows you a BAM file it has already converted it to SAM format that is 1 based.
The IGV help file is misleading the file is not 1 based, what they show is one based. Similarly when you open a BED file in IGV (BED is also zero based) you will see that it is drawn as a 1 based file. But the file is of course still zero based.
They just convert all data onto the same coordinate system.
BAM is 0b, SAM (which is what we can read on the screen) is 1b, maybe that's why the confusion.
As you point out, the SAM spec is unambiguous on this; BAM data are 0-based and SAM data are 1-based (sections 1.2 and 3.2). A feature at base 0 in a BAM file will be at base 1 when the data are exported as SAM.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy