getBM results are not in the same order as the input in the "values" argument
1
0
Entering edit mode
4 months ago
petrandent ▴ 20

Hello , I have noticed that when using the getBM function the actual order at the function's result, is not in the same order as the vector that is used for the "values" argument. How can I put the final results at the same order as the vector of the "values" argument?

Reproducible code follows:

library(GEOquery)
library(biomaRt)
df=getGEO("GSE138206")
df1=df[[1]]
df2=exprs(df1)
df3=df2[,-c(4,10,16)]
df4=df3[,1:10] #first 5 cancer,last 5 contra

#biomart
ensembl <- useMart("ensembl")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
affyids=rownames(df4)
gene_names=getBM(attributes=c('affy_hg_u133_plus_2', 'entrezgene_id'),
filters = 'affy_hg_u133_plus_2',
values =affyids,
mart = ensembl)

#probe to gene
result1
result2  #OMG what the h... is that??!??! they are not the same!!!

GEO biomaRt • 224 views
4
Entering edit mode
4 months ago

This is expected behaviour for biomaRt. The package is at the 'behest' of Ensembl's internal servers / databases and how they receive requests and return data. I guess that we are dealing in milliseconds of differences here, and that things such as Internet traffic play a role. biomaRt will neither return NA for anything that doesn't match - it just will not return anything for non-matches..

It may help to retrieve the entire table (can ironically be quicker) and do filtering and matching locally:

getBM(
attributes = c('affy_hg_u133_plus_2', 'entrezgene_id'),
mart = ensembl)


## #####

Note that the other Bioconductor annotation packages will return in the same order. For the U133 Plus 2.0, you'd need:

require(hgu133plus2.db)
select(
hgu133plus2.db,
keys = probes,
columns = c('PROBEID', 'SYMBOL', 'GENENAME', 'ENSEMBL'),
keytype = 'PROBEID')


A few more examples at the bottom, here: https://support.bioconductor.org/p/130727/#130733

1
Entering edit mode

#probe to gene

require(hgu133plus2.db)
gene_names2=select(
hgu133plus2.db,
keys = affyids,
columns = c('PROBEID', 'SYMBOL'),
keytype = 'PROBEID')
results=gene_names2[!duplicated(gene_names2\$PROBEID),]
real_results
biomart_results #OMG what is this??
result3 #YES finally, this is an accepted outcome

0
Entering edit mode

Thank you for the answer, I will try that. I cannot understand what's the point of someone using biomaRt , if it's not working as it is expected.

2
Entering edit mode

Not working to => your <= expectation