Question: Simplest way to perform GSEA from a weighted list of entrez gene ids
1
gravatar for rachel.cavill
2.5 years ago by
rachel.cavill30 wrote:

As part of a larger project I've implemented GSEA in Matlab. I want to test my code by comparing my output - p-values for KEGG pathways say, with another implementation. In my implementation the GSEA algorithm starts from a ranked list of gene ids with real valued weights. So I want to upload the same list to the tool used for comparison. However, as I look round tools for GSEA most of them seem to start at an earlier point, requiring the original gene data, which would mean that there would be scope for differences in preprocessing to affect the output, whereas I want a pure comparison of the GSEA part.

What would you say is the simplest way to perform GSEA (not another enrichment algorithm) on a weighted list of entrez-gene ids?

I'm fine if it involves some coding, but don't want to get bogged down editing large amounts of other people's code to perform what should be a straightforward test to check that my results are comparable with the expected results.

ADD COMMENTlink modified 8 months ago by pshubhamoy20 • written 2.5 years ago by rachel.cavill30
2
gravatar for vodka
2.5 years ago by
vodka80
Torino
vodka80 wrote:

I believe that http://www.broadinstitute.org/cancer/software/genepattern/modules/docs/GSEAPreranked/1 should work for you.

ADD COMMENTlink written 2.5 years ago by vodka80

Perfect, just what I needed.

ADD REPLYlink written 2.5 years ago by rachel.cavill30
2
gravatar for assaron
2.5 years ago by
assaron170
assaron170 wrote:

If you want to compare you implementation of GSEA with others, I'd recommend as @vodka suggested to use the Broad's program: it's developed by the original authors and works pretty fast compared to other implentations. Just beware it results in zero p-value when there was no permutation with more extreme statistic, which is biased and is not a recommended method (http://www.statsci.org/smyth/pubs/permp.pdf).

However, I would recommend to check out an R-package https://github.com/ctlab/fgsea (disclaimer: I'm the author of this package) and not to implement the GSEA method yourself: likely it will be slow if you're testing many pathways. You'd probably be interested in looking at this package as it implements a special algorithm to simulthaneously build background distribution for all the gene set sizes and thus it works much faster (up to hundred times) than all the versions I'm aware of, but giving the same p-values.

ADD COMMENTlink written 2.5 years ago by assaron170

I have compared clusterProfiler with broad institute's GSEA-P, https://guangchuangyu.github.io/2015/11/comparison-of-clusterprofiler-and-gsea-p/.

clusterProfiler and GSEA-P output identical pvalues. I also test fgsea with clusterProfiler, and they also output identical pvalues.

fgsea is indeed very fast and will be the back engine of next release of clusterProfiler.

ADD REPLYlink written 2.5 years ago by Guangchuang Yu2.0k

Thanks for the warnings. In this case the speed of the enrichment algorithm isn't the limiting factor, but I'll keep that in mind for future projects.

ADD REPLYlink written 2.5 years ago by rachel.cavill30

I am also trying fgsea & looks very easy and fast compare to GSEA, but how can I use my own geneset.gmt file and ranked file, I have tried with example data,

library(fgsea)
library(ggplot2)

data(examplePathways)
data(exampleRanks)
fgseaRes <- fgsea(pathways = examplePathways, stats = exampleRanks, minSize=15, maxSize=500, nperm=10000)
plotEnrichment(examplePathways[["5991130_Programmed_Cell_Death"]], exampleRanks) + labs(title="Programmed Cell Death")

so how to read my .gmt file and rank file in R, I tried following command but got errors:

gset <- GSA.read.gmt("genesets.gmt")
ranked <- read.table("rankedlist.txt",header=F, sep="\t")
fgseaRes <- fgsea(pathways = gset, stats = ranked, minSize=15,maxSize=500, nperm=10000)

Error in [.data.frame(x, order(x, na.last = na.last, decreasing = decreasing)) : undefined columns selected

ADD REPLYlink written 2.5 years ago by Mike1.1k

Mike, there is an example for that in the vignette: http://bioconductor.org/packages/devel/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html, see "Starting from files" section. Hope that helps.

ADD REPLYlink written 2.5 years ago by assaron170

Thanks, assaron,

I tried following command, but rnk.file & gmt.file are empty...

> rnk.file <- system.file("ranked.rnk")
> rnk.file
[1] ""
> gmt.file <- system.file("geneset.gmt")
> gmt.file
[1] ""
ADD REPLYlink written 2.5 years ago by Mike1.1k
1

Assuming you have downloaded files https://raw.githubusercontent.com/ctlab/fgsea/master/inst/extdata/naive.vs.th1.rnk and https://raw.githubusercontent.com/ctlab/fgsea/master/inst/extdata/mouse.reactome.gmt into your working directory, this should work

rnk.file <- "naive.vs.th1.rnk"
gmt.file <- "mouse.reactome.gmt"
ranks <- read.table(rnk.file,
                    header=TRUE, colClasses = c("character", "numeric"))
ranks <- setNames(ranks$t, ranks$ID)    
pathways <- gmtPathways(gmt.file)    
fgseaRes <- fgsea(pathways, ranks, minSize=15, maxSize=500, nperm=1000)

The ranks variable should be just a vector of gene wieights with names being gene ids and pathways is a list of vectors with gene ids.

If it still doesn't work, I suggest switching to e-mail. Mine is alsergbox@gmail.com

ADD REPLYlink written 2.5 years ago by assaron170

yes it works now!!!

Thanks for your quick response.

ADD REPLYlink written 2.5 years ago by Mike1.1k

When you say gene weights here, do you mean fold changes? P-values? Something else?

ADD REPLYlink written 19 months ago by lauren.fitch0
0
gravatar for pshubhamoy
8 months ago by
pshubhamoy20
pshubhamoy20 wrote:

how do I subset a formal class gseaResults?

I have the following

Formal class 'gseaResult' [package "DOSE"] with 10 slots ..@ result :'data.frame': 20 obs. of 11 variables: .. ..$ ID : chr [1:20] "rno05200" "rno04010" "rno04151" "rno05166" ... .. ..$ Description : chr [1:20] "Pathways in cancer" "MAPK signaling pathway" "PI3K-Akt signaling pathway" "HTLV-I infection" ... .. ..$ setSize : int [1:20] 370 224 231 203 152 133 118 108 102 136 ... .. ..$ enrichmentScore: num [1:20] -0.384 -0.401 -0.388 -0.425 -0.432 ... .. ..$ NES : num [1:20] -1.54 -1.52 -1.48 -1.59 -1.56 ... .. ..$ pvalue : num [1:20] 0.00164 0.00167 0.00169 0.00173 0.00174 ... .. ..$ p.adjust : num [1:20] 0.0145 0.0145 0.0145 0.0145 0.0145 ... .. ..$ qvalues : num [1:20] 0.00836 0.00836 0.00836 0.00836 0.00836 ... .. ..$ rank : num [1:20] 2544 1610 1928 1804 2600 ... .. ..$ leading_edge : chr [1:20] "tags=34%, list=23%, signal=27%" "tags=23%, list=15%, signal=20%" "tags=29%, list=18%, signal=24%" "tags=23%, list=17%, signal=20%" ... .. ..$ core_enrichment: chr [1:20] "171140/294962/25729/314384/29560/367072/288264/399489/116502/24426/24654/170668/501110/81685/29492/287357/11448"| __truncated__ "25266/25054/360640/59323/79114/24446/25267/29496/24674/114495/117269/25597/78965/116683/24516/292763/24329/2571"| __truncated__ "65248/361696/292406/24514/89805/78975/25513/300253/29302/25155/310553/64033/25266/25054/59323/79114/25634/11666"| __truncated__ "308995/84420/303539/414788/313050/365527/114212/64033/25266/360640/84426/299611/25267/84027/24674/24516/25289/5"| __truncated__ ... ..@ organism : chr "rno" ..@ setType : chr "KEGG" ..@ geneSets :List of 324 .. ..$ rno00010: chr [1:72] "100145871" "100364027" "100364062" "100911515" ... .. ..$ rno00020: chr [1:33] "100125384" "103690168" "103693780" "114096" ... .. ..$ rno00030: chr [1:31] "100360180" "108348261" "114508" "24189" ... .. ..$ rno00040: chr [1:34] "113992" "116463" "154516" "171408" ... .. ..$ rno00051: chr [1:39] "100364027" "100911515" "100911725" "114508" ... .. ..$ rno00052: chr [1:32] "100364027" "103690059" "114860" "116463" ... .. ..$ rno00053: chr [1:27] "113992" "154516" "24861" "24862" ... .. ..$ rno00061: chr [1:14] "113976" "114024" "116719" "117243" ... .. ..$ rno00062: chr [1:31] "100911186" "102549542" "113965" "140547" ... .. ..$ rno00071: chr [1:47] "100145871" "100911186" "113965" "113976" ... .. ..$ rno00072: chr [1:11] "100361036" "117099" "24450" "25014" ... .. ..$ rno00100: chr [1:19] "114100" "114700" "117278" "140910" ... .. ..$ rno00120: chr [1:16] "170588" "192242" "246211" "25284" ... .. ..$ rno00130: chr [1:12] "103693015" "24314" "24813" "25249" ... .. ..$ rno00140: chr [1:84] "100362350" "100910877" "108348086" "108348266" ... .. ..$ rno00190: chr [1:143] "100188937" "100361126" "100361960" "100362331" ... .. ..$ rno00220: chr [1:20] "192268" "24398" "24399" "24401" ... .. ..$ rno00230: chr [1:182] "100360582" "100361574" "100362333" "100363253" ... .. ..$ rno00232: chr [1:6] "114768" "116631" "116632" "24297" ... .. ..$ rno00240: chr [1:104] "100360582" "100361574" "100362333" "100363253" ... .. ..$ rno00250: chr [1:35] "100360621" "117544" "192268" "24240" ... .. ..$ rno00260: chr [1:40] "103691744" "114027" "114123" "171133" ... .. ..$ rno00270: chr [1:47] "100360621" "100912604" "103691744" "171347" ... .. ..$ rno00280: chr [1:56] "100360621" "100361036" "100911186" "113965" ... .. ..$ rno00290: chr [1:4] "25044" "29592" "360816" "64203" .. ..$ rno00310: chr [1:61] "100169747" "100359816" "100361710" "100362634" ... .. ..$ rno00330: chr [1:52] "100912604" "108348083" "114027" "24264" ... .. ..$ rno00340: chr [1:24] "24443" "25375" "25750" "266603" ... .. ..$ rno00350: chr [1:40] "100145871" "100360621" "103694877" "171178" ... .. ..$ rno00360: chr [1:22] "100360621" "103694877" "171179" "24311" ... .. ..$ rno00380: chr [1:47] "100360621" "100911186" "103693780" "113965" ... .. .. [list output truncated] ..@ geneList : Named num [1:10844] 8.15 6.39 5.33 4.82 4.78 ... .. ..- attr(*, "names")= chr [1:10844] "116463" "313352" "500685" "246253" ... ..@ keytype : chr "UNKNOWN" ..@ permScores : num[0 , 0 ] ..@ params :List of 6 .. ..$ pvalueCutoff : num 0.05 .. ..$ nPerm : num 1000 .. ..$ pAdjustMethod: chr "BH" .. ..$ exponent : num 1 .. ..$ minGSSize : num 100 .. ..$ maxGSSize : num 500 ..@ gene2Symbol: chr(0) ..@ readable : logi FALSE

ADD COMMENTlink written 8 months ago by pshubhamoy20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1150 users visited in the last hour