How to deal with `'select()' returned 1:many mapping between keys and columns` warning when retrieving Entrez IDs using Gene Symbols in R?
1
0
Entering edit mode
2.1 years ago

I want to replace the index of the dataframe (Gene Symbols) with GENE_ID:GENE_VALUE to use as a data matrix input for netgsa R package (https://cran.r-project.org/web/packages/netgsa/vignettes/netgsa.html).

First, I retrieve the Entrez IDs:

library(org.Hs.eg.db)
library(AnnotationDbi)

# gene_value is the Entrez ID
gene_value <- as.data.frame(mapIds(org.Hs.eg.db, keys=rownames(meth_df), column="ENTREZID", keytype="SYMBOL")) # gene_value is the Entrez ID

Traceback:

'select()' returned 1:many mapping between keys and columns

Then, I want to append the string ENTREZID: to the gene_value variable.

rownames(meth_df) <- paste0("ENTREZID:", gene_value)

Traceback:

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Expected rownames output (example):

## [1] "ENTREZID:127550" "ENTREZID:53947"  "ENTREZID:65985"  "ENTREZID:51166" 
## [5] "ENTREZID:15"     "ENTREZID:60496"

Example data:

> dput(meth_df[1:5,1:5])
structure(list(`TCGA-2K-A9WE-01A` = c(0.611033076810465, 0.786837244239289, 
0.531054614303851, 0.711916183761331, 0.758443223998425), `TCGA-2Z-A9J1-01A` = c(0.468013052647261, 
0.386177267500376, 0.508623627469028, 0.403601275088479, 0.754642399207848
), `TCGA-2Z-A9J2-01A` = c(0.593559707995411, 0.54983504208745, 
0.535207192925841, 0.613971903755576, 0.717278085189431), `TCGA-2Z-A9J3-01A` = c(0.638211007873003, 
0.319561448644096, 0.526699541432941, 0.450002172806716, 0.736440001203422
), `TCGA-2Z-A9J5-01A` = c(0.603998109440889, 0.638039512259872, 
0.584328151056768, 0.594021097192165, 0.818583455926719)), row.names = c("A1BG", 
"A1CF", "A2BP1", "A2LD1", "A2M"), class = "data.frame")
mapids R netgsa entrez • 9.9k views
ADD COMMENT
0
Entering edit mode

I have no error with your code and your example, what is your package.version("AnnotationDbi")?

ADD REPLY
0
Entering edit mode

The package version for AnnotationDbi is "1.58.0"

ADD REPLY
0
Entering edit mode

You changed the post now you have no error. As some ENTREZID will be NA's for some genes, you will need to drop the genes with no ENTREZID in both your gene_value list and meth_df

ADD REPLY
0
Entering edit mode

It still gives the same error. The meth_df does not have missing genes (index value).

ADD REPLY
0
Entering edit mode

You changed the error you had for mapIds, now you have gene_list which is a vector containing ENTREZID for your input genes. What I would do is to merge both information, remove genes with no ENTREZID and change the rownames as expected :

meth_df=na.omit(merge(meth_df,data.frame(gene_value),by=0))
rownames(meth_df) <- paste0("ENTREZID:", meth_df$gene_value)
ADD REPLY
0
Entering edit mode

gene_value produced error 'select()' returned 1:many mapping between keys and columns. So I need to solve this error first before I can proceed with your recommendation.

gene_value = mapIds(org.Hs.eg.db, keys=rownames(meth_df), column="ENTREZID", keytype="SYMBOL") # gene_value is the Entrez ID

'select()' returned 1:many mapping between keys and columns

meth_df=na.omit(merge(meth_df,data.frame(gene_value),by=0))
rownames(meth_df) <- paste0("ENTREZID:", meth_df$gene_value)

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

ADD REPLY
0
Entering edit mode

I don't understand, you first said that you had this error : "Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : mapIds must have at least one key to match against. "

After you modified your post and it worked resulting in "'select()' returned 1:many mapping between keys and columns" warning message but it is possible to deal with it

Now in your code there is again the first error, so why is it changing ?

ADD REPLY
0
Entering edit mode

Sorry for the confusion. I edited my comment above. Error code is as follows:

gene_value = mapIds(org.Hs.eg.db, keys=rownames(meth_df), column="ENTREZID", keytype="SYMBOL") # gene_value is the Entrez ID

'select()' returned 1:many mapping between keys and columns

meth_df=na.omit(merge(meth_df,data.frame(gene_value),by=0))
rownames(meth_df) <- paste0("ENTREZID:", meth_df$gene_value)

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

ADD REPLY
0
Entering edit mode
2.1 years ago
LChart 4.3k

If I'm using gene symbols for primary analysis, I prefer to use HGNC-provided mappings:

http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt

the 'hgnc_symbol' and 'entrez_id' columns are what you're looking for.

ADD COMMENT
0
Entering edit mode

Would you mind to provide the code? Thanks.

ADD REPLY
0
Entering edit mode
mapping <- read.csv('hgnc_complete_set.txt', sep='\t')[,c('symbol', 'entrez_id')]
sym2ent <- sprintf('ENTREZ:%d', mapping$entrez_id)
names(sym2ent) <- mapping$symbol
rownames(thing) <- ifelse(rownames(thing) %in% names(sym2ent), sym2ent[rownames(thing)], rownames(thing))
ADD REPLY
0
Entering edit mode

Thanks! However, the code only edit some of the row names. For example, "ENTREZ:1", "ENTREZ:29974" and "ENTREZ:2" are the new row names from the code; but "A2BP1" and "A2LD1" are the gene symbols from the original meth_df dataframe.

Is it because some genes do not have entrez ID? If so, how should I deal with it?

rownames(meth_df) <- ifelse(rownames(meth_df) %in% names(sym2ent), sym2ent[rownames(meth_df)], rownames(meth_df))

rownames(meth_df)

[1] "ENTREZ:1" "ENTREZ:29974" "A2BP1"
"A2LD1" [5] "ENTREZ:2"

ADD REPLY
0
Entering edit mode

Is it because some genes do not have entrez ID?

Yes.

If so, how should I deal with it?

If you strictly require entrez IDs, then you have to subset to those genes:

meth_df_entrez <- meth_df[grepl('^ENTREZ', rownames(meth_df)),]

Otherwise you have to live with a combination of HGNC and Entrez.

ADD REPLY

Login before adding your answer.

Traffic: 1160 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6