Gene set enrichment tool for RNA-Seq in Python?
4
4
Entering edit mode
8.5 years ago
user ▴ 900

Is there a library in Python for doing gene set enrichment analysis (GSEA, http://www.broadinstitute.org/gsea/index.jsp)? A library/tool that takes a foreground set of genes, a background set, a gene sets database like the one available from Broad and does the analysis without relying on microarray-specific items like probe IDs? All the tools I found are made for microarrays but I want to do this for rna-seq. If there's no library in Python, is there a command line tool that can do this again without relying on probe IDs?

gsea gene-ontology go RNA-Seq python • 9.0k views
ADD COMMENT
3
Entering edit mode
8.5 years ago
Adrian ▴ 700

There's nothing about gene set analysis that is dependent on array probe Ids; once you have gene level expression measurements (ie. indexed by gene symbol or entrez gene id) you can use any of the existing tools (eg GSEA). Some of them do have features to convert from probesets to gene symbols, but you don't need to use that feature.

It's also pretty straightforward to roll your own simple enrichment analysis in python. A python library would be nice though; I have some code for this that I've been meaning to tidy up..

ADD COMMENT
0
Entering edit mode

which existing tools would you use, apart from GSEA?

ADD REPLY
0
Entering edit mode

Just set some cutoffs and run hypergeometric tests by computing overlaps using python sets and calculating significance with scipy.stats.hypergeom and statsmodels.sandbox.stats.multicomp

ADD REPLY
1
Entering edit mode
8.5 years ago

Maybe contrary to the spirit of your question, but there are many great tools for this available for R from Bioconductor. I can see advantages of a python native library, but it might be faster to call R from python.

ADD COMMENT
1
Entering edit mode
8.5 years ago
Sudeep ★ 1.7k

The top two results for googling "GSEA python" were these libraries: pygsa and geseabase, and as Adrian already told you the analysis is independent of probeids. You will find the data format guides for GSEA and for pygsa library here: GSEA data format guide

ADD COMMENT
1
Entering edit mode
6.6 years ago
Dataman ▴ 350

What about 'GSEAPY: Gene Set Enrichment Analysis in Python'?

ADD COMMENT

Login before adding your answer.

Traffic: 1413 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6