Question

Esembl geneID with Characters

0

Entering edit mode

17 months ago

Jakpa ▴ 50

Hi,

I have a df of gene expression that looks like this:

expre

I want to map the Esembl Id with Gene name/Symbol using org.Hs.eg.db with this code:

res_df$symbol = mapIds(org.Hs.eg.db, keys = rownames(res_df),
                   keytype = "ENSEMBL", column = "SYMBOL")

i got this error:

Error in .testForValidKeys(x, keys, keytype, fks): None of 
the keys entered are valid keys for 'ENSEMBL'. Please use the keys 
method to see a listing of valid arguments

Though, I saw similar post which relate to the decimal towards the end of the esembl Idand i tried fixing it with :

res_df=gsub("\\..*","",row.names(res_df)) it did not give the required output. Then I realized that Esemble Id column does not have a name. I tried to name it like this names(res_df)[0] <- "EsemblId", but the output remain same.

Now, I have more than 50,000 rows . How do I write a code in R to remove the decimal and the numbers after it e.i, Esembl Id?

I think if am able to do that, my first code will work well based on previous post that I read.

Regards,

Esembl annotation GeneExpression R • 762 views

ADD COMMENT • link updated 17 months ago by i.sudbery 19k • written 17 months ago by Jakpa ▴ 50

score 1 · Answer 1 · 2022-11-05

1

Entering edit mode

17 months ago

rpolicastro 13k

rownames(res_df) <- gsub("\\.[0-9]+$", "", rownames(res_df))

Or if you prefer the tidyverse

library("stringr")

rownames(res_df) <- str_remove(rownames(res_df), "\\.[0-9]+$")

ADD COMMENT • link 17 months ago by rpolicastro 13k

0

Entering edit mode

rpolicastro , Thank you for your response. your code seems to work. but, its like the output time validity. I noticed that after few minutes of getting the output that i want, if I run it again, it will give error like this

res= mapIds(org.Hs.eg.db, keys = rownames(res),
                   keytype = "ENSEMBL", column = "SYMBOL",
                   multiVals = "first")

select()' returned 1:many mapping between keys and columns

then, this output:

ENSG00000000003'TSPAN6'ENSG00000000005'TNMD'ENSG00000000419'DPM1'ENSG00000000457'SCYL3'ENSG00000000460'C1orf112'ENSG00000000938'FGR'ENSG00000000971'CFH'ENSG00000001036'FUCA2'ENSG00000001084'GCLC'ENSG00000001167'NFYA'ENSG00000001460'STPG1'ENSG00000001461'NIPAL3'ENSG00000001497'LAS1L'ENSG00000001561'ENPP4'ENSG00000001617'SEMA3F'ENSG00000001626'CFTR'ENSG00000001629'ANKIB1'

instead of genesymbol as column with other variables like pvalue, Log2Foldchange etc. also, majority of the genesymbol are NAs

Please, any idea on how to resolve this?

ADD REPLY • link 17 months ago by Jakpa ▴ 50

0

Entering edit mode

Most genes as annotated by Ensembl do not have gene symbols, so when you fetch them, the NAs effectively mean "this gene does not have a name".

ADD REPLY • link 17 months ago by i.sudbery 19k