I would like to use the LD data in HapMap in combination with the latest genome annotation data in Ensembl 59. Unfortunately, if I am not wrong, HapMap rel. 27 is based on NCBI 36 coordinates, while latest ensembl uses the latest genome build (GRCh37). As far as I understand, the latest ensembl version with compatible coordinates would be 54, correct?
Edit:
Core question: how many SNPs (rd-ids) are re-annotated (deleted,renamed) between releases of dbSNP?
It seems like nobody has undertaken a full lift over of the HapMap bulk data to update all coordinates, at least I didn't find any information about this. So I was thinking about trying to do this.
This question is somewhat related http://biostar.stackexchange.com/questions/916/how-do-you-manage-moving-existing-projects-to-a-new-genome-build where the LiftOver tool was presented as a solution.
So here are my questions:
- Did anybody already try this, or would like to have this data, too?
- What would be the best approach to do the bulk conversion. For example running liftOver on the genomic coordinates, or is it be better to convert based on matching rs-snp ids?
- Is that a valid approach at all?
Any suggestions welcome.
Edit: One of the main concerns that I should mention, is that SNPs are re-named, deleted, positions changed. So I more and more get the impression, that both approaches, just mapping coordinates (the coodinates could be fine, but the SNP could have disappeared/renamed in dbSNP) or simply mapping the ids is not enough, even though it made be safer. I think Jorge's answer points into a good direction.
the easiest way to be completely sure about that is to have a look to the merge data from dbSNP. for several reasons we are waiting to update to dbSNP 131, but I can tell you that we maintain a list of such merge information for HapMap using dbSNP 130, and the amount of rs codes merged (from dbSNP 126, which is HapMap's template) are ~1200. add to that figure a small number of rs codes that have been removed (unfortunately we do not keep track of that information), and you will be able to take your own decisions.
Thanks, Pierre. That looks like a viable solution. But what if rs-ids were changed, SNPs renamed, deleted, or whatever? I heard that happened, rarely, but still.
Michael, Pierre: Is this SNP re-annotation(as in renamining, deletion etc) common as in the case protein sequence revisions. Is there a way to track the changes in SNP annotations ?
@Khader I don't now how frequent this re-annotation of snps is, that was one of the reasons for asking this. If it was a few 10 I would not bother, but I am not sure. Possibly Jorge's answer contains the clue.