Find all rsid synonyms for a list of rsids
2
0
Entering edit mode
3.4 years ago
Jab • 0

Hello,

I have two lists of several thousand rsids. I'd like to compare the two lists to see if there are any common SNPs between them. I am aware that many SNPs have multiple rsids, and this could result in missing matches if I don't have the correct rsid synonym in each list. Is there a way that I can simultaneously download all rsid synonyms for several thousand SNPs using the most recent version of Ensembl without typing each SNP into the search bar individually? I know that similar questions have been asked in the past, but all answers are outdated or the links provided no longer work.

Thank you.

Ensembl SNPs rsids • 1.8k views
4
Entering edit mode
3.4 years ago
Mike Smith ★ 1.7k

You can do this using R and biomaRt. Here's an example.

First lets create two example sets of SNPS. Between these the first pair are identical, the second pair are synonyms to each other, and the third pair are distinct.

snps1 <- c('rs4844600', 'rs4266886', 'rs6656401')
snps2 <- c('rs4844600', 'rs61737012', 'rs386638846')


Next we load the biomaRt package, and query the variation mart to return all the synonyms and their sources for our first set of rsIDs.

library(biomaRt)
## use the Ensembl variation mart
snp_mart <- useMart(biomart="ENSEMBL_MART_SNP",
dataset="hsapiens_snp")

## get the synonyms and their source for our SNPs
results <- getBM(filters = c('snp_filter'),
attributes = c('refsnp_id','synonym_name','synonym_source'),
values = snps1,
mart = snp_mart)


For reference, the first few rows of results looks like the below. You can filter at this stage if you know you only have synonyms from a certain source.

> head(results)
refsnp_id             synonym_name synonym_source
1 rs4266886               rs61198255  Archive dbSNP
2 rs4266886 NM_000651.4:c.487+787T>C     dbSNP HGVS
3 rs4266886 NM_000573.3:c.487+787T>C     dbSNP HGVS
4 rs4844600               rs58362463  Archive dbSNP
5 rs4844600               rs61737012  Archive dbSNP
6 rs4844600     NP_000564.2:p.Glu60=     dbSNP HGVS


We can now combine our original set of rsIDs with their synonyms.

snps1_complete <- c(snps1, unique(results\$synonym_name))


and then ask which of our second list of IDs is in this expanded list. We see it finds two entries as expected.

> snps2[snps2 %in% snp1_complete]
[1] "rs4844600"  "rs61737012"

0
Entering edit mode

Worked perfectly. Thanks!

0
Entering edit mode
3.4 years ago
GenoMax 110k

0
Entering edit mode

Thank you. However, the file seems to just be symbols. Is there a special program I need to read and use the file?