Question: Latent Variable Models For Gene Expression: Any Good?
gravatar for Mike Dewar
10.6 years ago by
Mike Dewar1.6k
Columbia University, NYC, USA
Mike Dewar1.6k wrote:

I'm performing some meta-analysis on gene-expression data from microarrays, and am looking through some of the techniques used to do this. One thing that crops up often is the use of latent variable models.

Either they are used on a per gene basis to calculate the probability of differential expression, such as Choi et al., or in a dimension reduction scheme, to highlight groups of "signature" genes as in Martoglio et al..

Both of these latent-variable based approaches are appealing to me, probably because of my Machine Learning background, as the model that the authors are using to define differential expression in both cases makes more sense to me than the more traditional statistical methods (*x*-tests, ranking).

However, I'm trying to embark on a pragmatism-not-idealism approach to work (and actually get something done), and I know that latent variable models can be a lot of effort sometimes. My questions are, therefore:

  1. Does anyone have any "good" experiences analysing gene-expression using latent-variable modelling approaches for differential analaysis in microarrays? For example a latent-variable model out-performed a more standard approach like SAM, or it did better at meta-analysis than RankProd.
  2. Does anyone have a feel for how easy latent variable models are when trying to explain to your biologist collaborators? Is the richer model worth the effort of trying to explain it?
  3. Is there a 'standard' R package that is used more than others for this kind of analysis? Typically, when meta-analysis shows people mention RankProd. Is there an equivalent package that the community recommends for latent-variable based approaches?
meta gene • 3.6k views
ADD COMMENTlink modified 2.5 years ago by Ram32k • written 10.6 years ago by Mike Dewar1.6k

Could you maybe ask this on the bioconductor mailing list? Just in case you don't get an answer here.

ADD REPLYlink written 10.6 years ago by Michael Dondrup48k

I haven't tried any of these approaches, but the next time I have some gexp data to play with I will have a go. Personally, I think that a better/richer model approach is always worth a go, some biologists I know just care that the results are good and don't care about the model, although most biologists with experience looking at microarrays should know about PCA and its not that a big leap to introduce ICA and those kind of approaches. Sorry I can't help or suggest much else, but I would be really interested in hearing how you get on (if you choose this path).

ADD REPLYlink written 10.6 years ago by Nathan Harmston1.1k

It's an interesting question and I'd also like to hear about how things turn out. Not something with which I have experience, but lavaan - and OpenMx - both look interesting.

ADD REPLYlink modified 2.5 years ago by Ram32k • written 10.6 years ago by Neilfws49k

Perhaps is its obvious, but remember that no one is going to "pay attention" to any explanation of latent variables if the results are not good. Don't think of computational biology as machine learning. From my point of view, the idea is not to give complicated methods which maybe improve performance by 1%, but solving problems with relatively good results and simple models (in many cases a model used for a different tas). The simpler you model is (assuming it comes with good results), the better chances it gets to be accepted by the community.

ADD REPLYlink written 9.6 years ago by Alf490
gravatar for Fabian Bull
9.5 years ago by
Fabian Bull1.3k
Fabian Bull1.3k wrote:

First question: I once developed a model for expression at the university using latent variables. It outperformed every none-static model. The question however is ill posed from my point of view.

Every good model has its pros and cons. It depends on the data and on the hardware you want to run your analysis on. Genes that are heavily differential expressed will always be discovered by normal methods. If you have some prior knowledge and are interested genes that lay on the edge use latent-variables. But there is no correct answer to: which model is better. I always stayed with KISS.

Second question: I think there is no way in explaing higher statistical models to biologists. Partly because they lack the basics. Partly because they are just not interested.

Third question: Dont know of anyone. I think its hard to develop a general framework for dealing with latent variables so I doubt any package exists.

ADD COMMENTlink modified 9.5 years ago • written 9.5 years ago by Fabian Bull1.3k
gravatar for andrew
6.0 years ago by
United States
andrew510 wrote:

We have a package that can do what you are looking for.  It's called iPathwayGuide and it's free to try as much as you want.  iPathwayGuide not only does the classical enrichment, but also models gene-expression on pathways and calculates perturbation and identifies putative mechanisms in each pathway.

Depending on the datasets, you can then perform a meta-analysis (built-in) to identify common genes, predicted miRNAs, GO terms, and Pathways across your analyses.

Give it a try. Here's a sample screenshot and a short video.


ADD COMMENTlink written 6.0 years ago by andrew510
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1900 users visited in the last hour