8.0 years ago by
0-based, half open systems allow cheap length calculations. That is, m-n instead of (m-n)+1 in a 1-based, closed system. Also, 0-based is convenient for programming; most widely-used programming languages use 0-based arrays. Another example is calculating overlap. To calculate the degree of overlap between two 0-based, half-open intervals, you can use the following:
a = [start1, end1)
b = [start2, end2)
overlap(a,b) = min(end1,end2) - max(start1,start2)
whereas with a one-based system it is:
a = [start1, end1]
b = [start2, end2]
overlap(a,b) = min(end1,end2) - max(start1,start2) + 1
The beauty of the above approach with 0-based is that if two intervals do not overlap, then the recipe will return a negative value whose absolute value is the distance between the two features.
So, for programming, I much prefer 0-based, as it prevents tons of extra (ugly and more expensive) "-1" and "+1" operations in one's code.
The counter argument is that our brains are trained to think in 1-based, closed systems. I suspect the designers of various formats such as BED (0-based), BAM (0-based), VCF (1-based), and GFF (1-based) made conscious decisions regarding the coordinate system based on the intent of the format. For example, BED is a fundamental format in the UCSC browser and much of the underlying code depends on it. Thus, the coordinate system is 0-based for speed and code cleanliness. Similarly, BAM requires efficiency. In contrast, perhaps the designed of VCF and GFF were more concerned with "readability" of the format?