Question

How to deal with `'select()' returned 1:many mapping between keys and columns` warning when retrieving Entrez IDs using Gene Symbols in R?

0

Entering edit mode

20 months ago

melissachua90 ▴ 60

I want to replace the index of the dataframe (Gene Symbols) with GENE_ID:GENE_VALUE to use as a data matrix input for netgsa R package (https://cran.r-project.org/web/packages/netgsa/vignettes/netgsa.html).

First, I retrieve the Entrez IDs:

library(org.Hs.eg.db)
library(AnnotationDbi)

# gene_value is the Entrez ID
gene_value <- as.data.frame(mapIds(org.Hs.eg.db, keys=rownames(meth_df), column="ENTREZID", keytype="SYMBOL")) # gene_value is the Entrez ID

Traceback:

'select()' returned 1:many mapping between keys and columns

Then, I want to append the string ENTREZID: to the gene_value variable.

rownames(meth_df) <- paste0("ENTREZID:", gene_value)

Traceback:

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Expected rownames output (example):

## [1] "ENTREZID:127550" "ENTREZID:53947"  "ENTREZID:65985"  "ENTREZID:51166" 
## [5] "ENTREZID:15"     "ENTREZID:60496"

Example data:

> dput(meth_df[1:5,1:5])
structure(list(`TCGA-2K-A9WE-01A` = c(0.611033076810465, 0.786837244239289, 
0.531054614303851, 0.711916183761331, 0.758443223998425), `TCGA-2Z-A9J1-01A` = c(0.468013052647261, 
0.386177267500376, 0.508623627469028, 0.403601275088479, 0.754642399207848
), `TCGA-2Z-A9J2-01A` = c(0.593559707995411, 0.54983504208745, 
0.535207192925841, 0.613971903755576, 0.717278085189431), `TCGA-2Z-A9J3-01A` = c(0.638211007873003, 
0.319561448644096, 0.526699541432941, 0.450002172806716, 0.736440001203422
), `TCGA-2Z-A9J5-01A` = c(0.603998109440889, 0.638039512259872, 
0.584328151056768, 0.594021097192165, 0.818583455926719)), row.names = c("A1BG", 
"A1CF", "A2BP1", "A2LD1", "A2M"), class = "data.frame")

mapids R netgsa entrez • 8.3k views

ADD COMMENT • link updated 20 months ago by LChart 3.9k • written 20 months ago by melissachua90 ▴ 60

0

Entering edit mode

I have no error with your code and your example, what is your package.version("AnnotationDbi")?

ADD REPLY • link 20 months ago by Basti ★ 2.0k

0

Entering edit mode

The package version for AnnotationDbi is "1.58.0"

ADD REPLY • link 20 months ago by melissachua90 ▴ 60

0

Entering edit mode

You changed the post now you have no error. As some ENTREZID will be NA's for some genes, you will need to drop the genes with no ENTREZID in both your gene_value list and meth_df

ADD REPLY • link 20 months ago by Basti ★ 2.0k

0

Entering edit mode

It still gives the same error. The meth_df does not have missing genes (index value).

ADD REPLY • link 20 months ago by melissachua90 ▴ 60

0

Entering edit mode

You changed the error you had for mapIds, now you have gene_list which is a vector containing ENTREZID for your input genes. What I would do is to merge both information, remove genes with no ENTREZID and change the rownames as expected :

meth_df=na.omit(merge(meth_df,data.frame(gene_value),by=0))
rownames(meth_df) <- paste0("ENTREZID:", meth_df$gene_value)

ADD REPLY • link 20 months ago by Basti ★ 2.0k

0

Entering edit mode

gene_value produced error 'select()' returned 1:many mapping between keys and columns. So I need to solve this error first before I can proceed with your recommendation.

gene_value = mapIds(org.Hs.eg.db, keys=rownames(meth_df), column="ENTREZID", keytype="SYMBOL") # gene_value is the Entrez ID

'select()' returned 1:many mapping between keys and columns

meth_df=na.omit(merge(meth_df,data.frame(gene_value),by=0))
rownames(meth_df) <- paste0("ENTREZID:", meth_df$gene_value)

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

ADD REPLY • link 20 months ago by melissachua90 ▴ 60

0

Entering edit mode

I don't understand, you first said that you had this error : "Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : mapIds must have at least one key to match against. "

After you modified your post and it worked resulting in "'select()' returned 1:many mapping between keys and columns" warning message but it is possible to deal with it

Now in your code there is again the first error, so why is it changing ?

ADD REPLY • link 20 months ago by Basti ★ 2.0k

0

Entering edit mode

Sorry for the confusion. I edited my comment above. Error code is as follows:

gene_value = mapIds(org.Hs.eg.db, keys=rownames(meth_df), column="ENTREZID", keytype="SYMBOL") # gene_value is the Entrez ID

'select()' returned 1:many mapping between keys and columns

meth_df=na.omit(merge(meth_df,data.frame(gene_value),by=0))
rownames(meth_df) <- paste0("ENTREZID:", meth_df$gene_value)

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

ADD REPLY • link 20 months ago by melissachua90 ▴ 60

score 0 · Answer 1 · 2022-08-17

0

Entering edit mode

20 months ago

LChart 3.9k

If I'm using gene symbols for primary analysis, I prefer to use HGNC-provided mappings:

http://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt

the 'hgnc_symbol' and 'entrez_id' columns are what you're looking for.

ADD COMMENT • link 20 months ago by LChart 3.9k

0

Entering edit mode

Would you mind to provide the code? Thanks.

ADD REPLY • link 20 months ago by melissachua90 ▴ 60

0

Entering edit mode

mapping <- read.csv('hgnc_complete_set.txt', sep='\t')[,c('symbol', 'entrez_id')]
sym2ent <- sprintf('ENTREZ:%d', mapping$entrez_id)
names(sym2ent) <- mapping$symbol
rownames(thing) <- ifelse(rownames(thing) %in% names(sym2ent), sym2ent[rownames(thing)], rownames(thing))

ADD REPLY • link 20 months ago by LChart 3.9k

0

Entering edit mode

Thanks! However, the code only edit some of the row names. For example, "ENTREZ:1", "ENTREZ:29974" and "ENTREZ:2" are the new row names from the code; but "A2BP1" and "A2LD1" are the gene symbols from the original meth_df dataframe.

Is it because some genes do not have entrez ID? If so, how should I deal with it?

rownames(meth_df) <- ifelse(rownames(meth_df) %in% names(sym2ent), sym2ent[rownames(meth_df)], rownames(meth_df))

rownames(meth_df)

[1] "ENTREZ:1" "ENTREZ:29974" "A2BP1"
"A2LD1" [5] "ENTREZ:2"

ADD REPLY • link 20 months ago by melissachua90 ▴ 60

0

Entering edit mode

Is it because some genes do not have entrez ID?

Yes.

If so, how should I deal with it?

If you strictly require entrez IDs, then you have to subset to those genes:

meth_df_entrez <- meth_df[grepl('^ENTREZ', rownames(meth_df)),]

Otherwise you have to live with a combination of HGNC and Entrez.

ADD REPLY • link 20 months ago by LChart 3.9k