I'm analyzing some RNA-seq data with
edgeR. According edgeR manual, we can use
org.Hs.egENSEMBL database in
org.Hs.eg.db package (version: 3.7.0) to convert ENSEMBL gene ID (ENSGxxxxxx) to Entrez ID. However, I found there are many ENSEMBL gene IDs cannot be found in egENSEMBL database. There are 30292 ENSEMBL ID records in egENSEMBL, while there are 58721 ENSEMBL gene IDs stored in GENCODE GRCh38 annotation file. Should I exclude genes being not in egENSEMBL database for downstream differential expression analysis just as the
edgeR manual do?
Codes in edgeR manual (I use egENSEMBL instead of egREFSEQ in my pipeline):
# y is DGEList object idfound <- y$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ) y <- y[idfound,]