Question: biomaRt search for a list of values [dplyr + column(s)]
gravatar for deepue
7 months ago by
deepue130 wrote:


I would like to query biomaRt databases for retrieving Ensembl Gene IDs (ensembl_gene_stable_id) for a list of SNPs (snp_filter) from the user input testData$rsNum in a tidyverse way.

testData <- readr::read_tsv("rs1467475747   8       148357
rs1378018226    8   148383
rs546813474 8   148402
rs1175049916    8   148522
rs1187272067    8   148523
rs1427441701    8   148553
rs201635470 8   148556
rs1483428031    8   148608
rs1251102826    8   148610", 
                     col_names = c("rsNum", "chrNum", "pos"), 
                     col_types = "cii")

I attempted to pass the filters as column names as below:

grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="", dataset="hsapiens_snp")
testData %>% getBM(attributes=c("refsnp_id", "chr_name", "chrom_start", "chrom_end", 
                                "ensembl_gene_stable_id", "associated_gene"), 
                   filters=c("snp_filter", "chr_name", "start", "end"), 
                   values=list(rsNum, chrNum, pos, pos), 
                   mart=grch37.snp, uniqueRows=TRUE)

which resulted in the error:

Error in getBM(., attributes = c("refsnp_id", "chr_name", "chrom_start", : object 'rsNum' not found

Is there any error in this approach of querying the marts?

However, I have also found the workaround to achieve the purpose in another way (source):

getBM(attributes=c("refsnp_id", "chr_name", "chrom_start", "chrom_end", 
                                "ensembl_gene_stable_id", "associated_gene"), 
                   filters=c("snp_filter", "chr_name", "start", "end"), 
                   values=list(testData$rsNum, testData$chrNum, testData$pos, testData$pos), 
                   mart=grch37.snp, uniqueRows=TRUE)

Though the later command achieves the expected output, I am looking forward to an option in the former approach by passing only the column name (rsNum, chrNum, pos, pos). Are you aware of any possibilities?

Thanks for your interest to answer the question.

biomart ensembl • 252 views
ADD COMMENTlink modified 7 months ago by rpolicastro4.0k • written 7 months ago by deepue130
gravatar for rpolicastro
7 months ago by
Bloomington, IN
rpolicastro4.0k wrote:

Your workaround is actually the correct way to do this. When you read in the data, it is read as a tibble (a fancy data.frame). In order to extract the values in a column, you need to reference the column with either a dollar sign, or double square brackets. If you wanted to literally pass pos to the getBM function, you would first need to define that variable pos <- testData$pos. However, you can take a little shortcut since the values argument takes a list of vectors.


testData <- testData %>%
  rename("start" = pos) %>%
  mutate("end" = start) %>%

This gives you the list of values you need as input.

> testData
[1] "rs1467475747" "rs1378018226" "rs546813474"  "rs1175049916" "rs1187272067"
[6] "rs1427441701" "rs201635470"  "rs1483428031" "rs1251102826"

[1] 8 8 8 8 8 8 8 8 8

[1] 148357 148383 148402 148522 148523 148553 148556 148608 148610

[1] 148357 148383 148402 148522 148523 148553 148556 148608 148610

You can now just pass testData to the values argument.

  attributes=c("refsnp_id", "chr_name", "chrom_start", "chrom_end", 
                      "ensembl_gene_stable_id", "associated_gene"), 
  filters=c("snp_filter", "chr_name", "start", "end"), 
  values=testData,  mart=grch37.snp, uniqueRows=TRUE)
ADD COMMENTlink modified 7 months ago • written 7 months ago by rpolicastro4.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2015 users visited in the last hour