Updating rsids from one version of dbSNP to another
2
0
Entering edit mode
4.4 years ago
BAGeno ▴ 190

Hi I am working on PharmGKB data. Which is on dbSNP version 147 while my vcfs are on dbSNP 150 version.

I am searching PharmGKB rsids in my file. But I think due to different version of dbSNP some rsids are not found.

I want to convert PharmGKB rsids from 147 to 150. But their are no coordinates of genome available in PharmGKB. If I searched coordinates and then use these coordinates to update rsids then this will be long process. I searched and found dbSNP batch query on internet. But on its site it is announced that it will be retired in June 2018.

Can any one please tell me how should I update rsids without finding its coordinates?

dbsnp rsids • 2.2k views
0
Entering edit mode

Please give an example of the data you have. If you have a VCF file, you could simply download the latest VCF from dbSNP and use Tabix or the Ensemble Variant Effect Predictor to directly annotate each variant with the current dbSNP (151). I would strongly recommend to use dbSNP151 instead of 150 because there was a major update recently, including lots of variants from GRCh37. In my recent case, 151 flagged a lot of variants as common (based on 1000Genomes and TOPMED), which I previously considered rare, simply because they were not included in dbSNP150 with any AF from 1KG or TOPMED.

0
Entering edit mode

I do not have vcf. I have data in this format.

 Annotation ID  Variant Gene    Chemical    Literature Id   Phenotype Category  Significance    Notes   Sentence    StudyParameters Alleles Chromosome
608431768   rs1131873   EPHX1 (PA27829) warfarin (PA451906) 19794411    dosage  yes     Allele A is associated with decreased dose of warfarin. 608431770   A   chr1
827690618   rs1131873   EPHX1 (PA27829) warfarin (PA451906) 21593757    dosage  no  in italian patients.    Allele A is not associated with dose of warfarin.   827690625   A   chr1
608431789   rs28371685  CYP2C9 (PA126)  warfarin (PA451906) 20072124    dosage  yes     Allele T is associated with decreased dose of warfarin as compared to allele C. 608431791   T   chr10

2
Entering edit mode
4.4 years ago
ATpoint 66k

What a terrible file format^^ (no offense). It does not even contain the genomic coordinate (or is the coordinate \$7 ?). Anyway, this is what I would do, given you only have the old rs-ID:

1. Use the rs-IDs in your file to query the dbSNP VCF version 147 for the genomic coordinates. Here is a post on how to possibly do this.
2. Get the latest dbSNP VCF and its tabix index (tbi). Also get Tabix if you do not already have it.
3. Use tabix to retrieve the current ID from the dbSNP151 VCF. Tabix works like this, e.g. for your first variant assuming the coordinate would be 445426:

tabix dbSNP151.vcf.gz chr1:445426-445426

Once you have a parsable list from step 2, you'll have to awk around a bit to replace the old IDs in your file with the retrieved ones from the 151-VCF. Hope this helps.

0
Entering edit mode
5 months ago
Sophia • 0

The "rsnps" package (an R programming language package) is very helpful for conversion (and gives other details on SNPs in addition to their most up-to-date rsID): https://cran.r-project.org/web/packages/rsnps/rsnps.pdf

The function ncbi_snp_query() would likely be the one you want to use for this action. Just read in your table to R and execute this function (see linked documentation for package installation instructions and more details on this function).