Hi, I have a very simple question. There is 4 snps like below:
1 161003087 161003087 C T comments: rs1000050, a SNP in Illumina SNP arrays
1 84647761 84647761 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
1 13133880 13133881 TC - comments: rs59770105, a 2-bp deletion
1 11326183 11326183 - AT comments: rs35561142, a 2-bp insertion
Why there are some the start position is the same to the end position, while some are not?
The numbers stay the same for SNPs. But for indels its different. AFAIK the numbering of the reference genome is kept intact to have othe SNP positions consistent.
So an DELETION of 2 bases directly reflects the TWO positions mentioned on the reference. The 2 base insertion has only a SINGLE point of entry otherwise the rest of the genome gets +1 making things a mess while comparing.
you have to keep in mind that this nomenclature tries to deal with several types of variants, not only single base ones, and that the chromosome positions referred are from the standard reference genome. so you are able to easily describe things that happens on that genome template, such as base changes, insertions or deletions.
the common way of describing the location of a SNP, considering a SNP as a single base substitution, is to give its chromosome location as its start and end. the same happens to insertions, as you can tell where an insertion starts to occur, but related to the reference you can't tell where it ends, since the bases inserted are not from the reference. but with deletions you are in fact knowing which bases are being removed from the reference, so that's the reason why you may see different start and end points: the start would be the first base removed, and the end would be the last one. strictly talking, only deletion entries would be able to have unmatching figures on their start and end fields.