getBM results are not in the same order as the input in the "values" argument
1
0
Entering edit mode
4 months ago
petrandent ▴ 20

Hello , I have noticed that when using the getBM function the actual order at the function's result, is not in the same order as the vector that is used for the "values" argument. How can I put the final results at the same order as the vector of the "values" argument?

Reproducible code follows:

library(GEOquery)
library(biomaRt)
df=getGEO("GSE138206")
df1=df[[1]]
df2=exprs(df1)
df3=df2[,-c(4,10,16)]
df4=df3[,1:10] #first 5 cancer,last 5 contra

#biomart
ensembl <- useMart("ensembl")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
affyids=rownames(df4)
gene_names=getBM(attributes=c('affy_hg_u133_plus_2', 'entrezgene_id'), 
      filters = 'affy_hg_u133_plus_2', 
      values =affyids, 
      mart = ensembl)

#probe to gene
result1=head(affyids)
result2=head(gene_names)
result1
result2  #OMG what the h... is that??!??! they are not the same!!!
GEO biomaRt • 224 views
ADD COMMENT
4
Entering edit mode
4 months ago

This is expected behaviour for biomaRt. The package is at the 'behest' of Ensembl's internal servers / databases and how they receive requests and return data. I guess that we are dealing in milliseconds of differences here, and that things such as Internet traffic play a role. biomaRt will neither return NA for anything that doesn't match - it just will not return anything for non-matches..

It may help to retrieve the entire table (can ironically be quicker) and do filtering and matching locally:

getBM(
  attributes = c('affy_hg_u133_plus_2', 'entrezgene_id'),
  mart = ensembl)

#####

Note that the other Bioconductor annotation packages will return in the same order. For the U133 Plus 2.0, you'd need:

require(hgu133plus2.db)
select(
  hgu133plus2.db,
  keys = probes,
  columns = c('PROBEID', 'SYMBOL', 'GENENAME', 'ENSEMBL'),
  keytype = 'PROBEID')

A few more examples at the bottom, here: https://support.bioconductor.org/p/130727/#130733

ADD COMMENT
1
Entering edit mode

Your answer was correct, I attach the code for solving this biomaRt disfunction with your help

#probe to gene
real_results=head(affyids)
biomart_results=head(gene3)

require(hgu133plus2.db)
gene_names2=select(
  hgu133plus2.db,
  keys = affyids,
  columns = c('PROBEID', 'SYMBOL'),
  keytype = 'PROBEID')
results=gene_names2[!duplicated(gene_names2$PROBEID),]
result3=head(results)
real_results
biomart_results #OMG what is this??
result3 #YES finally, this is an accepted outcome
ADD REPLY
0
Entering edit mode

Thank you for the answer, I will try that. I cannot understand what's the point of someone using biomaRt , if it's not working as it is expected.

ADD REPLY
2
Entering edit mode

Not working to => your <= expectation

ADD REPLY

Login before adding your answer.

Traffic: 1497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6