I have a (probably stupid) question about how the chromosome positions are numbered in SAM files.

I used bowtie to align reads to hg19 and got SAM files as output. I pulled the columns indicating chromosome and position so that I have files that look like:

chr, position

chr1, {starting position for read 1}

chr1, {starting position for read 2}

chr1, {starting position for read 3}

... etc.

My question is: do the position indices reset at each chromosome (i.e position 1 on chromosome 2 would read "chr2, 1") or does the position indicate the position within the entire genome (i.e. if chromosome 1 has X positions, position 1 on chromosome 2 would read "chr2, X+1")?

It is a valid question, don't worry. Every chromosome (or contig or whatever you use to align against) is considered a separate unit so it starts at 1 and ends at the length of the chromosome. Start of 1 is for 1-based formats, for 0-based formats the start would be 0, see Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems


