Question: Gene set enrichment tool for RNA-Seq in Python?
4
gravatar for user
5.0 years ago by
user790
United States
user790 wrote:

Is there a library in Python for doing gene set enrichment analysis (GSEA, http://www.broadinstitute.org/gsea/index.jsp)? A library/tool that takes a foreground set of genes, a background set, a gene sets database like the one available from Broad and does the analysis without relying on microarray-specific items like probe IDs? All the tools I found are made for microarrays but I want to do this for rna-seq. If there's no library in Python, is there a command line tool that can do this again without relying on probe IDs?

ADD COMMENTlink modified 3.0 years ago by Dataman260 • written 5.0 years ago by user790
3
gravatar for Adrian
5.0 years ago by
Adrian680
Cambridge, MA
Adrian680 wrote:

There's nothing about gene set analysis that is dependent on array probe Ids; once you have gene level expression measurements (ie. indexed by gene symbol or entrez gene id) you can use any of the existing tools (eg GSEA).   Some of them do have features to convert from probesets to gene symbols, but you don't need to use that feature.

It's also pretty straightforward to roll your own simple enrichment analysis in python.  A python library would be nice though;  I have some code for this that I've been meaning to tidy up..

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Adrian680

which existing tools would you use, apart from GSEA?

ADD REPLYlink written 5.0 years ago by user790

Just set some cutoffs and run hypergeometric tests by computing overlaps using python sets and calculating significance with scipy.stats.hypergeom and statsmodels.sandbox.stats.multicomp 

ADD REPLYlink written 5.0 years ago by Adrian680
1
gravatar for johnstantongeddes
5.0 years ago by
Burlington, VT
johnstantongeddes410 wrote:

Maybe contrary to the spirit of your question, but there are many great tools for this available for R from Bioconductor. I can see advantages of a python native library, but it might be faster to call R from python

ADD COMMENTlink written 5.0 years ago by johnstantongeddes410
1
gravatar for Sudeep
5.0 years ago by
Sudeep1.6k
.
Sudeep1.6k wrote:

The top two results for googling "GSEA python" were these libraries: pygsa  and geseabase, and as Adrian already told you the analysis is independent of probeids. You will find the data format guides for GSEA and for pygsa library here: GSEA data format guide

ADD COMMENTlink written 5.0 years ago by Sudeep1.6k
0
gravatar for Dataman
3.0 years ago by
Dataman260
Finland
Dataman260 wrote:

What about 'GSEAPY: Gene Set Enrichment Analysis in Python'?

ADD COMMENTlink written 3.0 years ago by Dataman260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 996 users visited in the last hour