Question: Software to determine whether genes are related to cancer
1
gravatar for lapis44
5.0 years ago by
lapis4420
United States
lapis4420 wrote:

I have a list of genes that showed up as differentially expressed between patients and controls. I have seen in papers, people performing analyses to see what fraction of genes in a list are related to cancer, using "Ingenuity Pathway Analysis" etc.

I have been unable to use this software in that manner, though, and wanted to ask if anyone is familiar with another way to accomplish this? 

rna-seq gene • 1.7k views
ADD COMMENTlink modified 5.0 years ago by Prakki Rama2.2k • written 5.0 years ago by lapis4420
3
gravatar for mikhail.shugay
5.0 years ago by
mikhail.shugay3.3k
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

Well unfortunately IPA is not a free software, but there is a lot of great free software available. You can try out functional enrichment analysis with GSEA (http://www.broadinstitute.org/gsea/index.jsp) and DAVID (http://david.abcc.ncifcrf.gov/) platforms.

PS. I've managed to get GSEA working for RNA-Seq data by using expression table input with log2-transformed FPKM values produced by Tophat-Cuffquant (http://cufflinks.cbcb.umd.edu/manual.html#cuffquant).

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by mikhail.shugay3.3k

Thank you! I have used DAVID before.. but could I input a set of human genes and determine the percent that may be related to cancer in humans? I don't know of a way to do that in DAVID.

ADD REPLYlink written 5.0 years ago by lapis4420

You can play with Functional Annotation there. Add your gene list as OFFICIAL_GENE_SYMBOL, then for example go to Pathways -> KEGG_PATHWAY in "Annotation Summary Results" section and click chart to see if any onco-pathways are enriched.

You can also do this in an unsupervised manner: click on "Functional Annotation Clustering" in "Annotation Summary Results" section and top-enriched annotation clusters will appear. Those are created using similarity in various annotation terms, e.g. GO category cell-cycle is somewhat associated with KEGG pathways in cancer, etc. So, hopefully, among top enriched cluster you'll get annotation categories enriched in oncogenesis.

ADD REPLYlink written 5.0 years ago by mikhail.shugay3.3k

This is pretty incredible. Thank you! I input 263 genes into DAVID, and did the Pathways --> KEGG_PATHWAY. I got 15 genes listed: 8 were in "pathways in cancer", 4 in "prostate cancer", and 3 in "notch signaling pathway". I am not sure how to determine if this is all just false discovery rate, or significant though. Since 12 cancer-related genes /263 could easily be due to false positives? Indeed, the Benjamini values listed are large (0.99), but I don't know how that is calculated.

ADD REPLYlink written 5.0 years ago by lapis4420

Yep this more looks like false-positives. You can try exploring other annotations. You can also try to make your differential expression criteria more stringent. If the total number of genes decreases and those 12 genes remain, this could indicate that they're true positives :)

ADD REPLYlink written 5.0 years ago by mikhail.shugay3.3k

Thanks for all your insight!!

ADD REPLYlink written 5.0 years ago by lapis4420

Using the `camera` and `(m)roast` functions in the edgeR and limma (after applying `voom`) packages is another (principled) way to do GSEA-like analyses on RNA-seq data

ADD REPLYlink written 5.0 years ago by Steve Lianoglou5.0k

Thanks for the point! Could those be used in case there are no biological replicas (only conditions) and DE gene set couldn't be computed?

ADD REPLYlink written 5.0 years ago by mikhail.shugay3.3k

You don't need to perform a differential expression analysis prior to running these GSEA analyses, however you do need to fit a linear model to your data & design and (almost certainly) will need some replication somewhere.

ADD REPLYlink written 5.0 years ago by Steve Lianoglou5.0k

I am actually struggling with the input to GSEA to analyze RNA-seq data. Could you please explain in detail about log2-transformed FPKM values(obtained from cufflink) as input to GSEA, as mentioned above.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Anushka20
1
gravatar for Steve Lianoglou
5.0 years ago by
Steve Lianoglou5.0k
US
Steve Lianoglou5.0k wrote:

The cancer Gene Census will likely be useful here.

Their executive description of the project is below:

 

The cancer Gene Census is an ongoing effort to catalogue those genes for which mutations have been causally implicated in cancer. The original census and analysis was published in Nature Reviews Cancer and supplemental analysis information related to the paper is also available.

The census is not static but rather is updated regularly/as needed. In particular we are grateful to Felix Mitelman and his colleagues in providing information on more genes involved in uncommon translocations in leukaemias and lymphomas. Currently, more than 1% of all human genes are implicated via mutation in cancer. Of these, approximately 90% have somatic mutations in cancer, 20% bear germline mutations that predispose to cancer and 10% show both somatic and germline mutations.

ADD COMMENTlink written 5.0 years ago by Steve Lianoglou5.0k

Wow, thanks for the information. I will look into it!

ADD REPLYlink written 5.0 years ago by lapis4420
1
gravatar for andrew.j.skelton73
5.0 years ago by
London
andrew.j.skelton735.6k wrote:

Have you considered https://cansar.icr.ac.uk/ (CANSAR) ... Great online tool

ADD COMMENTlink written 5.0 years ago by andrew.j.skelton735.6k
0
gravatar for Prakki Rama
5.0 years ago by
Prakki Rama2.2k
Singapore
Prakki Rama2.2k wrote:

Lynx should also be useful. Check the pathway tab after submitting your gene set.

 

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Prakki Rama2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour