Question

Software to determine whether genes are related to cancer

1

Entering edit mode

10.0 years ago

lapis44 ▴ 20

I have a list of genes that showed up as differentially expressed between patients and controls. I have seen in papers, people performing analyses to see what fraction of genes in a list are related to cancer, using "Ingenuity Pathway Analysis" etc.

I have been unable to use this software in that manner, though, and wanted to ask if anyone is familiar with another way to accomplish this?

RNA-Seq gene • 3.7k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by lapis44 ▴ 20

1

Entering edit mode

10.0 years ago

Steve Lianoglou 5.2k

The cancer Gene Census will likely be useful here.

Their executive description of the project is below:

The cancer Gene Census is an ongoing effort to catalogue those genes for which mutations have been causally implicated in cancer. The original census and analysis was published in Nature Reviews Cancer and supplemental analysis information related to the paper is also available.

The census is not static but rather is updated regularly/as needed. In particular we are grateful to Felix Mitelman and his colleagues in providing information on more genes involved in uncommon translocations in leukaemias and lymphomas. Currently, more than 1% of all human genes are implicated via mutation in cancer. Of these, approximately 90% have somatic mutations in cancer, 20% bear germline mutations that predispose to cancer and 10% show both somatic and germline mutations.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

Wow, thanks for the information. I will look into it!

ADD REPLY • link 10.0 years ago by lapis44 ▴ 20

1

Entering edit mode

10.0 years ago

andrew.j.skelton73 6.5k

Have you considered https://cansar.icr.ac.uk/ (CANSAR) ... Great online tool

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

10.0 years ago

Prakki Rama ★ 2.7k

Lynx should also be useful. Check the pathway tab after submitting your gene set.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Prakki Rama ★ 2.7k

Ram · Accepted Answer · 2014-05-08

3

Entering edit mode

10.0 years ago

mikhail.shugay 3.5k

Well unfortunately IPA is not a free software, but there is a lot of great free software available. You can try out functional enrichment analysis with GSEA (http://www.broadinstitute.org/gsea/index.jsp) and DAVID (http://david.abcc.ncifcrf.gov/) platforms.

PS. I've managed to get GSEA working for RNA-Seq data by using expression table input with log2-transformed FPKM values produced by Tophat-Cuffquant (http://cufflinks.cbcb.umd.edu/manual.html#cuffquant).

ADD COMMENT • link 10.0 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Thank you! I have used DAVID before.. but could I input a set of human genes and determine the percent that may be related to cancer in humans? I don't know of a way to do that in DAVID.

ADD REPLY • link 10.0 years ago by lapis44 ▴ 20

0

Entering edit mode

You can play with Functional Annotation there. Add your gene list as OFFICIAL_GENE_SYMBOL, then for example go to Pathways -> KEGG_PATHWAY in "Annotation Summary Results" section and click chart to see if any onco-pathways are enriched.

You can also do this in an unsupervised manner: click on "Functional Annotation Clustering" in "Annotation Summary Results" section and top-enriched annotation clusters will appear. Those are created using similarity in various annotation terms, e.g. GO category cell-cycle is somewhat associated with KEGG pathways in cancer, etc. So, hopefully, among top enriched cluster you'll get annotation categories enriched in oncogenesis.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by mikhail.shugay 3.5k

0

Entering edit mode

This is pretty incredible. Thank you! I input 263 genes into DAVID, and did the Pathways --> KEGG_PATHWAY. I got 15 genes listed: 8 were in "pathways in cancer", 4 in "prostate cancer", and 3 in "notch signaling pathway". I am not sure how to determine if this is all just false discovery rate, or significant though. Since 12 cancer-related genes /263 could easily be due to false positives? Indeed, the Benjamini values listed are large (0.99), but I don't know how that is calculated.

ADD REPLY • link 10.0 years ago by lapis44 ▴ 20

0

Entering edit mode

Yep this more looks like false-positives. You can try exploring other annotations. You can also try to make your differential expression criteria more stringent. If the total number of genes decreases and those 12 genes remain, this could indicate that they're true positives :)

ADD REPLY • link 10.0 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Thanks for all your insight!!

ADD REPLY • link 10.0 years ago by lapis44 ▴ 20

0

Entering edit mode

Using the camera and (m)roast functions in the edgeR and limma (after applying voom) packages is another (principled) way to do GSEA-like analyses on RNA-seq data

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

Thanks for the point! Could those be used in case there are no biological replicas (only conditions) and DE gene set couldn't be computed?

ADD REPLY • link 10.0 years ago by mikhail.shugay 3.5k

0

Entering edit mode

You don't need to perform a differential expression analysis prior to running these GSEA analyses, however you do need to fit a linear model to your data & design and (almost certainly) will need some replication somewhere.

ADD REPLY • link 10.0 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

I am actually struggling with the input to GSEA to analyze RNA-seq data. Could you please explain in detail about log2-transformed FPKM values(obtained from cufflink) as input to GSEA, as mentioned above.

ADD REPLY • link 9.5 years ago by Anushka ▴ 20