Esembl geneID with Characters
1
0
Entering edit mode
4 months ago
Jakpa ▴ 50

Hi,

I have a df of gene expression that looks like this:

I want to map the Esembl Id with Gene name/Symbol using org.Hs.eg.db with this code:

res_df$symbol = mapIds(org.Hs.eg.db, keys = rownames(res_df), keytype = "ENSEMBL", column = "SYMBOL")  i got this error: Error in .testForValidKeys(x, keys, keytype, fks): None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments  Though, I saw similar post which relate to the decimal towards the end of the esembl Idand i tried fixing it with : res_df=gsub("\\..*","",row.names(res_df)) it did not give the required output. Then I realized that Esemble Id column does not have a name. I tried to name it like this names(res_df)[0] <- "EsemblId", but the output remain same. Now, I have more than 50,000 rows . How do I write a code in R to remove the decimal and the numbers after it e.i, Esembl Id? I think if am able to do that, my first code will work well based on previous post that I read. Regards, Esembl annotation GeneExpression R • 439 views ADD COMMENT 1 Entering edit mode 4 months ago rownames(res_df) <- gsub("\\.[0-9]+$", "", rownames(res_df))


Or if you prefer the tidyverse

library("stringr")

rownames(res_df) <- str_remove(rownames(res_df), "\\.[0-9]+\$")

0
Entering edit mode

rpolicastro , Thank you for your response. your code seems to work. but, its like the output time validity. I noticed that after few minutes of getting the output that i want, if I run it again, it will give error like this

res= mapIds(org.Hs.eg.db, keys = rownames(res),
keytype = "ENSEMBL", column = "SYMBOL",
multiVals = "first")


select()' returned 1:many mapping between keys and columns

then, this output:

ENSG00000000003'TSPAN6'ENSG00000000005'TNMD'ENSG00000000419'DPM1'ENSG00000000457'SCYL3'ENSG00000000460'C1orf112'ENSG00000000938'FGR'ENSG00000000971'CFH'ENSG00000001036'FUCA2'ENSG00000001084'GCLC'ENSG00000001167'NFYA'ENSG00000001460'STPG1'ENSG00000001461'NIPAL3'ENSG00000001497'LAS1L'ENSG00000001561'ENPP4'ENSG00000001617'SEMA3F'ENSG00000001626'CFTR'ENSG00000001629'ANKIB1'

instead of genesymbol as column with other variables like pvalue, Log2Foldchange etc. also, majority of the genesymbol are NAs

Please, any idea on how to resolve this?

0
Entering edit mode

Most genes as annotated by Ensembl do not have gene symbols, so when you fetch them, the NAs effectively mean "this gene does not have a name".