Question: Prior Distribution On Microarray Gene Expression
4
gravatar for Mike Dewar
9.4 years ago by
Mike Dewar1.5k
Columbia University, NYC, USA
Mike Dewar1.5k wrote:

Is there a commonly accepted prior distribution on gene expression from microarray experiments?

I'm interested in any priors used in microarray analysis that are biologically meaningful. For example, is a Gaussian prior most appropriate for log2 transformed normalised oligo data? If so, is there a good reason for this?

I'm asking as Wang et al seem to generate a prior using data from one ('Lymphochip') microarray and then update this "prior" using data from another (Affy) microarray. I'm not convinced this is particularly "Bayesian", and would be more comfortable given a prior derived from some understanding of how the data should be distributed, which is then updated using both the Affy and Lymphochip data.

I'd be curious to know how others felt about this approach, too!

microarray • 2.0k views
ADD COMMENTlink written 9.4 years ago by Mike Dewar1.5k
1

This a pretty complex issue. Now the paper seems to be six years old, I would search the citing literature and see what type of validation or critique the method has gained over the years.

ADD REPLYlink written 9.4 years ago by Istvan Albert ♦♦ 81k
1

Starting from these earlier papers the models gets progressively more complicated, as people build hierarchical models to represent gene expression. I was hoping that this paper might serve as an example of a prior distribution, and allow an answer that focuses on priors on gene expression (or fold change or whatever) rather than getting caught up in wider modelling issues. I thin you're right, though - I need to follow the literature along and see how people combine multiple data sources...

ADD REPLYlink written 9.4 years ago by Mike Dewar1.5k

are you trying to combine different microarray datasets? what are you trying to achieve by doing this?

ADD REPLYlink written 9.4 years ago by Nathan Harmston1.1k

Right now: yes I'm trying to combine different microarray data sets, though I tried to keep the question pretty general because I'd like to start getting some basic understanding of gene expression from a data centric point of view. Over the last 6 months I've kind of jumped into microarray analysis head first without really covering the basics.

ADD REPLYlink written 9.4 years ago by Mike Dewar1.5k

have you considered something like RankProd (see here ... all you need is lists of differentially expressed genes in order to do this and you don't combine the underlying expression values.

I don't know if you've seen that before or not. Hope it helps.

ADD REPLYlink modified 7 weeks ago by RamRS24k • written 9.4 years ago by Nathan Harmston1.1k

Thanks for the RankProd pointer. One of the reasons I was starting to look at more complex models of expression was to assess the potential of combining RT-PCR data with array data. I'm pretty sure the numbers emerging from these analyses will be in completely different spaces, and hence a model of expression would become pretty important. And coming from a discipline that suggests "model, don't normalize", one of the first questions to think about is my prior distribution. I'm starting to think, though, that this is not a common approach...

ADD REPLYlink written 9.4 years ago by Mike Dewar1.5k
3
gravatar for Nathan Harmston
9.4 years ago by
Nathan Harmston1.1k
London
Nathan Harmston1.1k wrote:

To me the idea of generating a prior using one platform and using this as a prior for another is extremely bad. I would assume that the value you obtain from your probe is a obtained from a combination of the underlying real gene expression value and some error model ( [additive|multiplicative] [poisson|gaussian|log-normal] noise ). The error model for each platform will be different ....(ignoring experimental noise).

Whilst I don't like the gaussian because its easy argument ... sometimes it does work well and its pretty ok. Although a lot of people seem to suggest that gene expression data is poisson or even fat-tailed scale-free (Levy distribution)..... the Central Limit Theorem does not hold under this condition (as the variance > mean ).

HTH

ADD COMMENTlink modified 9.4 years ago • written 9.4 years ago by Nathan Harmston1.1k

Thanks for this - it's nice to have my worries confirmed! I guess I need to jump into the literature a bit more. I came across something that used a mixture model for expression - a uniform distribution over differentially expressed genes and a normal for those genes not differentially expressed. I guess this is a nice way of handling fat-tailed distributions when we're specifically looking at differential expression. Will chase this up...

ADD REPLYlink written 9.4 years ago by Mike Dewar1.5k
1
gravatar for Istvan Albert
9.4 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

For example, is a Gaussian prior most appropriate for log2 transformed normalised oligo data? If so, is there a good reason for this?

I think the Gaussian prior may be as good as it gets without getting into a complex modeling. The sole (yet pretty compelling) reason to go with the Gaussian distribution is the Central Limit theorem although some of the conditions may not be satisfied.

ADD COMMENTlink modified 14 months ago by RamRS24k • written 9.4 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour