How does SNP position numeration work?
2
0
Entering edit mode
6.8 years ago
eyb ▴ 210

Is counting of each SNP position starts from the beginning of each chromosome? Or it is continuous?

For example

In my dataset first snp for chromosome 2 rs2685230 has a position 437664 and last SNP rs10191556 has a position 242521405

But first SNP at chromosome 13 rs11617984 has a position 19622143

Is this a mistake in my data, or it means that 13 chromosome has a first SNP at 19622143? Is it supposed to be shorter than chromosome 2?

position bp SNP • 1.4k views
3
Entering edit mode
6.8 years ago

The position is the position within each chromosome. Actually using a continuous numbering system would be a nightmare. Have a look at the VCF spec for further details.

0
Entering edit mode

So on a chromosome where is nucleotide number 0 or number 1. Is it on the very tip of the telomere? Is is the p telomere or the q arm telomere? So does the chromosomal position simply step up by 1 for each nucleotide marching through the centromere to the other arm?

1
Entering edit mode

The first base is at one of the ends (assuming non-circular DNA). Which of the two ends isn't always known. For mouse/human/etc. (i.e., high quality reference genomes), the first base is the tip of the p-arm. Yes, each subsequent base is one position higher.

2
Entering edit mode
6.8 years ago

The ID's of the SNP are just a primary key in the dbsnp database (e.g: http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html ) . The SNPs occur where the regions have been studied, there is no relationship with the length of the chromosomes. Furthermore, the chr13 is telocentric:

$curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr2.fa.gz" | gunzip -c | uniq -c | head -n 10 1 >chr2 200 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 1 CGTATCCcacacaccacacccacacaccacacccacacacacccacaccc 1 acacccacacacaccacacccacacaccacacccacacccacacaccaca 1 cccacaccacacccacacaccacacaccacacccacacccacacacacca 1 cacccacacaccacacccacacacaccctaaccctaacccctaaccccta 1 accctaaccctacccgaaccctaaccctaaccctaacccctaaccctaac 1 ccctaaccctaaccctaaccgtaaccctaaccctttaccctaacccgaac 1 ccctaacccctaacccctaacccttaaccctaacccttaaccctgaccct 1 gaccctgaccgtgaccctgaccctaacccgaacccgaacccgaaccccga$ curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr13.fa.gz" | gunzip -c | uniq -c | head -n 10
1 >chr13
380400 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
1 GAATTCAACTGCCTCCCTGGCCTTTCCCTGCCAATCTATGTGCCCCAGCA
1 GCCAACTTACATAGCACTGAGTGCAGACTTGTAAATAGACCTTCCAGTTC
1 TGCTATAATCAATACCTTATTGTCCATAACTCAATTTGGAGAAGGTTTAG
1 CTGTCTACCAACTCTTGTGGAGAGTTTCTGTGAAGTTTTGTTTTGGGTTG
1 CAAGAATCTGGAAAACAGATGCAGATGTTTTTGAGGAAGATTTTGAAATT
1 TCTACTTATAAGGTACCCAAAATGGGATCCAAACTCTTGAATTTGGTTGA
1 TCTTCTGAAATACATACCTGTGTTTTAAGATTTGCTTGAGCAAACCTTTA
1 ACCATGGAAATTTTAACCAATGATTTCCAGGTTGAAACAATTCCAGTTTT


so you won't find any SNPs in the 5' region of this chromosome.