biomaRt human paralog timeout
1
0
Entering edit mode
5.7 years ago

I am using the github biomaRt version 2.37.5 am able to successful pull orthologs across a number of species. However, when I switch to trying to pull paralogs I get an curl timing out error. See the below code. I have tried different hosts. I have even limited the attributes to just "ensembl_gene_id" "hsapiens_paralog_ensembl_gene" I have also tried using the "values" parameter and limiting the number of genes to <100. It does not appear to be an actual latency or data volume issue.

Thanks for any pointers!

human <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",
                 dataset = "hsapiens_gene_ensembl",
                 host    = "useast.ensembl.org")

attr <- as.data.table(listAttributes(human))

attributes <- c("ensembl_gene_id", attr[grepl("paralog", name), name][1])

para <- getBM(attributes,
              filters    = "with_hsapiens_paralog",
              values  = TRUE,
              mart     = human)
gene R • 1.7k views
ADD COMMENT
0
Entering edit mode
5.7 years ago

I suspect that the original request was formatted in such a way that the query blows up (cartesian product).

I therefore first identified the genes with paralogs and then requested the paralog information only for those genes (2 steps instead of 1 essentially).

para.attr <- c("ensembl_gene_id", attr[grepl("paralog", name), name])

hgid <- getBM(attributes = "ensembl_gene_id",
              filters    = "with_hsapiens_paralog",
              values     = TRUE,
              mart       = human)$ensembl_gene_id

para <- getBM(attributes = para.attr,
              filters    = "ensembl_gene_id",
              values     = hgid,
              mart       = human)

Hope this is helpful for someone in the future.

ADD COMMENT

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6