Question

Gene Set Enrichment Analysis, KEGG and over representative analysis

1

Entering edit mode

2.6 years ago

synat.keam ▴ 100

Dear Seniors,

I am looking to perform GSEA, KEGG and Over Representative Analysis. I found ClusterProfiler interesting and had ago with "GO classification" including groupGo (gene classification based on GO distribution at a specific level) , enrichGO (Over Representative analysis), gseGO (GO Gene Set Enrichment Analysis). ClusterProfiler needs EntrezID. Unfortunately, I did the aligment, where my cont matrix already generated gene symbol (I have attached DESEq2 output). I had 28951 genes, which have been fitted and generated from DESEq2 output.

ClusterProfiler mentions briefly about bitr function, but it is applicable only to enrichGO / Over Representative Analysis. However, I still had a go with bitr, which permits the conversion of Gene Symbol to EntrezID or ENSEMBL ID. Following are my codes I got from the manual.

Test with my data set

library(tidyverse)

res.profile<- read.csv(file="res.csv", header=TRUE, stringsAsFactors = FALSE)

res.profile.na<- na.omit(res.profile)

Select the logfold changes and gene symbol

Res.Select<- res.profile %>% select(Gene.Symbol, log2FoldChange)

Create genelist

library(clusterProfiler)

feature 1: numeric vector

geneList = Res.Select[,2]

feature 2: named vector

names(geneList) = as.character(Res.Select[,1])

feature 3: decreasing orde

geneList = sort(geneList, decreasing = TRUE)

Make gene

gene <- names(geneList)[abs(geneList) > 2]

gene.df

gene.df <- bitr(gene, fromType = "SYMBOL", toType = c("ENSEMBL", "ENTREZID"), OrgDb = org.Mm.eg.db)

However, once I used bitr to convert gene symbol to EntrezID, I got only 962 EntrezID as the ouput meaning lots of gene symbol did not have ENTREZID (Reduced from 28951 to only 962). What do you think? I am a bit reluctant to procced so I am looking for suggestion from all seniors and members or anyone who used clusterprofiler before.

Or if any member knows whether we can start the ClusterProfiler with gene symbol or any way I can convert gene symbol to ENSEMBL ID or ENTREZID, I am happy to look through your code if you could provide.

Regards,

Synat, enter image description here

ClusterProfiler • 1.1k views

ADD COMMENT • link 2.6 years ago by synat.keam ▴ 100

1

Entering edit mode

I think I have solved the problem using Biomart

gene_id<- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "entrezgene_id"), mart = useDataset("mmusculus_gene_ensembl", useMart("ensembl")))

view(gene_id)

Merge the dataframe

DEseq2output<- merge(res.profile, gene_id[,c(2,3)], by.x="Gene", by.y="external_gene_name")

ADD REPLY • link 2.6 years ago by synat.keam ▴ 100