Question

NAs using org.Hs.eg.db for Ensembl ID to Gene Symbol annotation

0

Entering edit mode

3.0 years ago

Jakpa ▴ 50

Hi Everyone,

I did Ensembl ID to Gene Symbol annotation using org.Hs.eg.db with this code:

annotn = mapIds(org.Hs.eg.db, keys = rownames(res),
                   keytype = "ENSEMBL", column = "SYMBOL",
                   multiVals = "first")

res is a geneExpression dataFrame that has more than 50,000 ensembleID. but, after running the annotation, about 45% are NAs. i.e, they were not assigned any gene symbol.

is that org.Hs.eg.db could not properly do the mapping due to the dataset? or my syntax is not totally correct?

how do I fix this? are there other options?

I dont want to delete theEnsembl ID with NAs

regards,

RNASeq annotation expression Gene R • 1.9k views

ADD COMMENT • link updated 3.0 years ago by rpolicastro 13k • written 3.0 years ago by Jakpa ▴ 50

0

Entering edit mode

Hi! Can you show some IDs (rownames(res))?

ADD REPLY • link 3.0 years ago by iraun 6.2k

0

Entering edit mode

iraun ,

'ENSG00000288663''ENSG00000288667''ENSG00000288669''ENSG00000288670''ENSG00000288674''ENSG00000288675'

ADD REPLY • link 3.0 years ago by Jakpa ▴ 50

0

Entering edit mode

Not all ensembl IDs have an associated gene name, and there are genes with a single gene name, but multiple gene IDs (the joys of gene identifiers).

If you want to ensure the best chance of mapping your gene IDs to the gene symbols make sure you are using the same ensembl release version the gene expression data was generated from. If there isn't an org database associated with that release you could use biomaRt to convert the IDs instead.

ADD REPLY • link 3.0 years ago by rpolicastro 13k

0

Entering edit mode

rpolicastro , Thanks for your response. Initially, i did this using biomart

ensid_symbol<-function(ids){
  mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
  genes <- getBM(filters= "ensembl_gene_id", 
                 attributes= c("ensembl_gene_id","hgnc_symbol"),
                 values=ids, mart= mart)
  return(genes)
    }

df <- ensid_symbol(row.names(res_output))

result_df <- as.data.frame(res_output)                 
result_df$ensembl_gene_id <- row.names(result_df)
result_df <- merge(df,result_df, by = "ensembl_gene_id")
resOrdered<-result_df[with(result_df, order(abs(log2FoldChange), padj, decreasing = TRUE)), ]

and i got empty observations with only column names

I tried to solve the problem, but i couldnt .

Please, Can you spot any code error?

ADD REPLY • link updated 3.0 years ago by rpolicastro 13k • written 3.0 years ago by Jakpa ▴ 50