Updating rsids from one version of dbSNP to another
Entering edit mode
3.3 years ago
BAGeno ▴ 180

Hi I am working on PharmGKB data. Which is on dbSNP version 147 while my vcfs are on dbSNP 150 version.

I am searching PharmGKB rsids in my file. But I think due to different version of dbSNP some rsids are not found.

I want to convert PharmGKB rsids from 147 to 150. But their are no coordinates of genome available in PharmGKB. If I searched coordinates and then use these coordinates to update rsids then this will be long process. I searched and found dbSNP batch query on internet. But on its site it is announced that it will be retired in June 2018.

Can any one please tell me how should I update rsids without finding its coordinates?

dbsnp rsids • 1.6k views
Entering edit mode

Please give an example of the data you have. If you have a VCF file, you could simply download the latest VCF from dbSNP and use Tabix or the Ensemble Variant Effect Predictor to directly annotate each variant with the current dbSNP (151). I would strongly recommend to use dbSNP151 instead of 150 because there was a major update recently, including lots of variants from GRCh37. In my recent case, 151 flagged a lot of variants as common (based on 1000Genomes and TOPMED), which I previously considered rare, simply because they were not included in dbSNP150 with any AF from 1KG or TOPMED.

Entering edit mode

I do not have vcf. I have data in this format.

 Annotation ID  Variant Gene    Chemical    Literature Id   Phenotype Category  Significance    Notes   Sentence    StudyParameters Alleles Chromosome
608431768   rs1131873   EPHX1 (PA27829) warfarin (PA451906) 19794411    dosage  yes     Allele A is associated with decreased dose of warfarin. 608431770   A   chr1
827690618   rs1131873   EPHX1 (PA27829) warfarin (PA451906) 21593757    dosage  no  in italian patients.    Allele A is not associated with dose of warfarin.   827690625   A   chr1
608431789   rs28371685  CYP2C9 (PA126)  warfarin (PA451906) 20072124    dosage  yes     Allele T is associated with decreased dose of warfarin as compared to allele C. 608431791   T   chr10
Entering edit mode
3.3 years ago
ATpoint 55k

What a terrible file format^^ (no offense). It does not even contain the genomic coordinate (or is the coordinate $7 ?). Anyway, this is what I would do, given you only have the old rs-ID:

  1. Use the rs-IDs in your file to query the dbSNP VCF version 147 for the genomic coordinates. Here is a post on how to possibly do this.
  2. Get the latest dbSNP VCF and its tabix index (tbi). Also get Tabix if you do not already have it.
  3. Use tabix to retrieve the current ID from the dbSNP151 VCF. Tabix works like this, e.g. for your first variant assuming the coordinate would be 445426:

    tabix dbSNP151.vcf.gz chr1:445426-445426

Once you have a parsable list from step 2, you'll have to awk around a bit to replace the old IDs in your file with the retrieved ones from the 151-VCF. Hope this helps.


Login before adding your answer.

Traffic: 2644 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6