Question

Latent Variable Models For Gene Expression: Any Good?

15

Entering edit mode

15.0 years ago

Mike Dewar ★ 1.6k

I'm performing some meta-analysis on gene-expression data from microarrays, and am looking through some of the techniques used to do this. One thing that crops up often is the use of latent variable models.

Either they are used on a per gene basis to calculate the probability of differential expression, such as Choi et al., or in a dimension reduction scheme, to highlight groups of "signature" genes as in Martoglio et al..

Both of these latent-variable based approaches are appealing to me, probably because of my Machine Learning background, as the model that the authors are using to define differential expression in both cases makes more sense to me than the more traditional statistical methods (*x*-tests, ranking).

However, I'm trying to embark on a pragmatism-not-idealism approach to work (and actually get something done), and I know that latent variable models can be a lot of effort sometimes. My questions are, therefore:

Does anyone have any "good" experiences analysing gene-expression using latent-variable modelling approaches for differential analaysis in microarrays? For example a latent-variable model out-performed a more standard approach like SAM, or it did better at meta-analysis than RankProd.
Does anyone have a feel for how easy latent variable models are when trying to explain to your biologist collaborators? Is the richer model worth the effort of trying to explain it?
Is there a 'standard' R package that is used more than others for this kind of analysis? Typically, when meta-analysis shows people mention RankProd. Is there an equivalent package that the community recommends for latent-variable based approaches?

gene meta • 5.6k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 15.0 years ago by Mike Dewar ★ 1.6k

0

Entering edit mode

Could you maybe ask this on the bioconductor mailing list? Just in case you don't get an answer here.

ADD REPLY • link 15.0 years ago by Michael 56k

0

Entering edit mode

I haven't tried any of these approaches, but the next time I have some gexp data to play with I will have a go. Personally, I think that a better/richer model approach is always worth a go, some biologists I know just care that the results are good and don't care about the model, although most biologists with experience looking at microarrays should know about PCA and its not that a big leap to introduce ICA and those kind of approaches. Sorry I can't help or suggest much else, but I would be really interested in hearing how you get on (if you choose this path).

ADD REPLY • link 15.0 years ago by Nathan Harmston ★ 1.1k

0

Entering edit mode

It's an interesting question and I'd also like to hear about how things turn out. Not something with which I have experience, but lavaan - and OpenMx - both look interesting.

ADD REPLY • link updated 6.8 years ago by Ram 45k • written 15.0 years ago by Neilfws 49k

0

Entering edit mode

Perhaps is its obvious, but remember that no one is going to "pay attention" to any explanation of latent variables if the results are not good. Don't think of computational biology as machine learning. From my point of view, the idea is not to give complicated methods which maybe improve performance by 1%, but solving problems with relatively good results and simple models (in many cases a model used for a different tas). The simpler you model is (assuming it comes with good results), the better chances it gets to be accepted by the community.

ADD REPLY • link 14.0 years ago by Alf ▴ 490

score 1 · Answer 1 · 2011-09-06

First question: I once developed a model for expression at the university using latent variables. It outperformed every none-static model. The question however is ill posed from my point of view.

Every good model has its pros and cons. It depends on the data and on the hardware you want to run your analysis on. Genes that are heavily differential expressed will always be discovered by normal methods. If you have some prior knowledge and are interested genes that lay on the edge use latent-variables. But there is no correct answer to: which model is better. I always stayed with KISS.

Second question: I think there is no way in explaing higher statistical models to biologists. Partly because they lack the basics. Partly because they are just not interested.

Third question: Dont know of anyone. I think its hard to develop a general framework for dealing with latent variables so I doubt any package exists.

Ram · Answer 2 · 2015-03-05

We have a package that can do what you are looking for. It's called iPathwayGuide and it's free to try as much as you want. iPathwayGuide not only does the classical enrichment, but also models gene-expression on pathways and calculates perturbation and identifies putative mechanisms in each pathway.

Depending on the datasets, you can then perform a meta-analysis (built-in) to identify common genes, predicted miRNAs, GO terms, and Pathways across your analyses.

Give it a try. Here's a sample screenshot and a short video.

< image not found >