Dear Seniors,
I am looking to perform GSEA, KEGG and Over Representative Analysis. I found ClusterProfiler interesting and had ago with "GO classification" including groupGo (gene classification based on GO distribution at a specific level) , enrichGO (Over Representative analysis), gseGO (GO Gene Set Enrichment Analysis). ClusterProfiler needs EntrezID. Unfortunately, I did the aligment, where my cont matrix already generated gene symbol (I have attached DESEq2 output). I had 28951 genes, which have been fitted and generated from DESEq2 output.
ClusterProfiler mentions briefly about bitr function, but it is applicable only to enrichGO / Over Representative Analysis. However, I still had a go with bitr, which permits the conversion of Gene Symbol to EntrezID or ENSEMBL ID. Following are my codes I got from the manual.
Test with my data set
library(tidyverse)
res.profile<- read.csv(file="res.csv", header=TRUE, stringsAsFactors = FALSE)
res.profile.na<- na.omit(res.profile)
Select the logfold changes and gene symbol
Res.Select<- res.profile %>% select(Gene.Symbol, log2FoldChange)
Create genelist
library(clusterProfiler)
feature 1: numeric vector
geneList = Res.Select[,2]
feature 2: named vector
names(geneList) = as.character(Res.Select[,1])
feature 3: decreasing orde
geneList = sort(geneList, decreasing = TRUE)
Make gene
gene <- names(geneList)[abs(geneList) > 2]
gene.df
gene.df <- bitr(gene, fromType = "SYMBOL", toType = c("ENSEMBL", "ENTREZID"), OrgDb = org.Mm.eg.db)
However, once I used bitr to convert gene symbol to EntrezID, I got only 962 EntrezID as the ouput meaning lots of gene symbol did not have ENTREZID (Reduced from 28951 to only 962). What do you think? I am a bit reluctant to procced so I am looking for suggestion from all seniors and members or anyone who used clusterprofiler before.
Or if any member knows whether we can start the ClusterProfiler with gene symbol or any way I can convert gene symbol to ENSEMBL ID or ENTREZID, I am happy to look through your code if you could provide.
Regards,
Synat,
I think I have solved the problem using Biomart
gene_id<- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "entrezgene_id"), mart = useDataset("mmusculus_gene_ensembl", useMart("ensembl")))
view(gene_id)
Merge the dataframe
DEseq2output<- merge(res.profile, gene_id[,c(2,3)], by.x="Gene", by.y="external_gene_name")