biomaRt crashes R studio
0
0
Entering edit mode
14 months ago
gernophil ▴ 80

Hey everyone,

I just wanted to execute a script that worked before. However, everytime I try to run it now RStudio gets unresponsive. I didn't change anything. Does anyone else experience this?

This is an extract from my script:

library(biomaRt)
...
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
data_table[, symbol := getBM(mart = mart,
                             attributes = "hgnc_symbol",
                             filters = "ensembl_gene_id",
                             values = `Ensembl Gene ID`)]

Best

biomaRt • 699 views
ADD COMMENT
1
Entering edit mode

Haven't worked that much with data.tables, but it seems your function sends a separate HTTP request for each gene symbol? If that is truly the case, I think you might be running into some sort of Denial of Service protection/rate limit from the API, which you are flooding with a few thousand requests?

ADD REPLY
0
Entering edit mode

Are you sure, it sends a request for every symbol individually? Shouldn't data.table do this only if you define a function(x), if assigning a new column? I'll try to to it with the list of symbols and check, if it makes a difference.

The the problem does not seem to be the getBM() function, but the assigning of the new column of the data.table:

mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
symbol <- getBM(mart = mart,
                attributes = "hgnc_symbol",
                filters = "ensembl_gene_id",
                values = sgrna_table$`Ensembl Gene ID`)

sgrna_table[, symbol := ..symbol]

also crashes my RStudio. That's weird. I never had that before.

ADD REPLY
0
Entering edit mode

No, I am not sure, since I have no knowledge regarding the internal workings of data.table.

But this quick (and probably not authoritative) test indicates it might be the case:

library(data.table)

mydata <- as.data.table(mtcars)

mydata[, len := length('mpg')]

#testfunction that errors with length(x) > 1
testfunction <- function(x){if(x > 1){FALSE} else {NA}}

mydata[, vectorized := testfunction('mpg')]

The function length() always returns 1, which suggests that the column is broken up into separate invocations. To be sure, I also wrote a custom testfunction() which on purpose does not accept a vectorized parameter.

If you run

testfunction(c(1:10))
Error in if (x > 1) { : the condition has length > 1

you get an error. Since the same function runs like a charm in the data.table, I don't think that more than one value at any given time is provided to the function. It might be that data.table employs some clever logic to distinguish between functions that can operate on vectors and such that can't, but...I'd be surprised if that was the case.

ADD REPLY

Login before adding your answer.

Traffic: 2179 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6