Does dbSNP have different conventions regarding right-shift vs left-shift of variant positions when it reports for GRCh37 versus GRCh38?
Two examples are rs367896724 and rs1424506967. Both are:
- small insertions
- occur at the beginning of chr1 where the reference sequence has not yet diverged between the two genome references.
- have a repeated base in the ref sequence so there are two positions that could be reported, and still yield the same resultant sequence.
But in GRCh38, their web-page's reported positions are left-shifted, whereas for GRCh37, they are right-shifted.
I speculated that perhaps this was somehow related to execution of a liftOver, but I tried liftOvers in both directions (using chain files from UCSC) and those positions get mapped unchanged.
Is left-shift versus right-shift an explicit choice of dbSNP which is documented somewhere (I couldn't find anything)?
If there is a systemic difference in left-versus-right shifting in dbSNP, then I would think that could seriously impair attempts to assign RS#s to variants from e.g. VCF files which I believe generally follow a left-shifted convention.
It is somewhat unclear why it is reported different on that page. on the "dbSNP page" they are reported identically in the variant details and hgvs tabs https://www.ncbi.nlm.nih.gov/snp/rs367896724#variant_details
is it given a different coordinate on other data files?
Thank you for that observation.
I looked into the actual full downloadable VCF files in the different builds and in those cases the locations agree.
So I guess I'll conclude that it's a bug/error specific to their "front page" summary (e.g. my hyperlinks above), where they present the alleles with a blank reference ("->C,CC" ) as opposed to the VCF or HGVS presentation they show elsewhere.
Thanks again
You may also want to look at this --> https://pmc.ncbi.nlm.nih.gov/articles/PMC7523648 Perhaps it is the normalization/projection that is currently in use.