I would like to use the LD data in HapMap in combination with the latest genome annotation data in Ensembl 59. Unfortunately, if I am not wrong, HapMap rel. 27 is based on NCBI 36 coordinates, while latest ensembl uses the latest genome build (GRCh37). As far as I understand, the latest ensembl version with compatible coordinates would be 54, correct?
Core question: how many SNPs (rd-ids) are re-annotated (deleted,renamed) between releases of dbSNP?
It seems like nobody has undertaken a full lift over of the HapMap bulk data to update all coordinates, at least I didn't find any information about this. So I was thinking about trying to do this.
This question is somewhat related http://biostar.stackexchange.com/questions/916/how-do-you-manage-moving-existing-projects-to-a-new-genome-build where the LiftOver tool was presented as a solution.
So here are my questions:
- Did anybody already try this, or would like to have this data, too?
- What would be the best approach to do the bulk conversion. For example running liftOver on the genomic coordinates, or is it be better to convert based on matching rs-snp ids?
- Is that a valid approach at all?
Any suggestions welcome.
Edit: One of the main concerns that I should mention, is that SNPs are re-named, deleted, positions changed. So I more and more get the impression, that both approaches, just mapping coordinates (the coodinates could be fine, but the SNP could have disappeared/renamed in dbSNP) or simply mapping the ids is not enough, even though it made be safer. I think Jorge's answer points into a good direction.