Question: Is Bam One-Based Or Zero-Based?
3
gravatar for Alex Reynolds
7.0 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Reading section 1.2 of the SAM specification, it says that BAM is 0-based. Yet some folks I know who work with sequencing data in that format say it is 1-based. Who is correct?

bam • 5.7k views
ADD COMMENTlink written 7.0 years ago by Alex Reynolds28k
9
gravatar for Istvan Albert
7.0 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

The spec says that the SAM format is one based and that the BAM format is zero based.

But this latter only matters if you access the file directly - if you access a BAM file via a tool like samtools that turns BAM into SAM then it will be turned into a 1 based format.

ADD COMMENTlink written 7.0 years ago by Istvan Albert ♦♦ 81k

Sneaky! Thanks for the explanation...

ADD REPLYlink written 7.0 years ago by Alex Reynolds28k

So far, i though BAM is 0-based. But when i look at BAM description at IGV

http://www.broadinstitute.org/igv/BAM

It says BAM is 1-based. Which is confusing.

From ENCODE mapped BAM file, what is the best way to manually confirm if the BAM file is in 0-based or 1-based. For eg: ENCODE CAGE data, is this 0-based or 1-based

# Download Encode BAM
wget
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRikenCage/wgEncodeRikenCageHelas3CellPapAlnRep1.bam
wget
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRikenCage/wgEncodeRikenCageHelas3CellPapAlnRep2.bam

 

 

 

ADD REPLYlink written 4.4 years ago by Chirag Nepal2.2k
2

it only needs to be treated as zero based if you write a tool that opens the binary BAM file directly and you acess and extract the field that contains the coordinate itself. For example you use a programming API in python, java or C. Then you need to regard it as 0 based. In any other interpretation the conversion is being done for you, when IGV shows you a BAM file it has already converted it to SAM format that is 1 based. 

The IGV help file is misleading the file is not 1 based, what they show is one based. Similarly when you open a BED file in IGV (BED is also zero based) you will see that it is drawn as a 1 based file. But the file is of course still zero based. 

 

They just convert all data onto the same coordinate system.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Istvan Albert ♦♦ 81k
4
gravatar for JC
7.0 years ago by
JC8.2k
Mexico
JC8.2k wrote:

BAM is 0b, SAM (which is what we can read on the screen) is 1b, maybe that's why the confusion.

ADD COMMENTlink written 7.0 years ago by JC8.2k
3
gravatar for iw9oel_ad
7.0 years ago by
iw9oel_ad6.0k
iw9oel_ad6.0k wrote:

As you point out, the SAM spec is unambiguous on this; BAM data are 0-based and SAM data are 1-based (sections 1.2 and 3.2). A feature at base 0 in a BAM file will be at base 1 when the data are exported as SAM.

ADD COMMENTlink written 7.0 years ago by iw9oel_ad6.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 576 users visited in the last hour