Question

Prior Distribution On Microarray Gene Expression

4

Entering edit mode

13.8 years ago

Mike Dewar ★ 1.6k

Is there a commonly accepted prior distribution on gene expression from microarray experiments?

I'm interested in any priors used in microarray analysis that are biologically meaningful. For example, is a Gaussian prior most appropriate for log2 transformed normalised oligo data? If so, is there a good reason for this?

I'm asking as Wang et al seem to generate a prior using data from one ('Lymphochip') microarray and then update this "prior" using data from another (Affy) microarray. I'm not convinced this is particularly "Bayesian", and would be more comfortable given a prior derived from some understanding of how the data should be distributed, which is then updated using both the Affy and Lymphochip data.

I'd be curious to know how others felt about this approach, too!

microarray • 3.0k views

ADD COMMENT • link updated 13.8 years ago by Nathan Harmston ★ 1.1k • written 13.8 years ago by Mike Dewar ★ 1.6k

1

Entering edit mode

This a pretty complex issue. Now the paper seems to be six years old, I would search the citing literature and see what type of validation or critique the method has gained over the years.

ADD REPLY • link 13.8 years ago by Istvan Albert 100k

1

Entering edit mode

Starting from these earlier papers the models gets progressively more complicated, as people build hierarchical models to represent gene expression. I was hoping that this paper might serve as an example of a prior distribution, and allow an answer that focuses on priors on gene expression (or fold change or whatever) rather than getting caught up in wider modelling issues. I thin you're right, though - I need to follow the literature along and see how people combine multiple data sources...

ADD REPLY • link 13.8 years ago by Mike Dewar ★ 1.6k

0

Entering edit mode

are you trying to combine different microarray datasets? what are you trying to achieve by doing this?

ADD REPLY • link 13.8 years ago by Nathan Harmston ★ 1.1k

0

Entering edit mode

Right now: yes I'm trying to combine different microarray data sets, though I tried to keep the question pretty general because I'd like to start getting some basic understanding of gene expression from a data centric point of view. Over the last 6 months I've kind of jumped into microarray analysis head first without really covering the basics.

ADD REPLY • link 13.8 years ago by Mike Dewar ★ 1.6k

0

Entering edit mode

have you considered something like RankProd (see here ... all you need is lists of differentially expressed genes in order to do this and you don't combine the underlying expression values.

I don't know if you've seen that before or not. Hope it helps.

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 13.8 years ago by Nathan Harmston ★ 1.1k

0

Entering edit mode

Thanks for the RankProd pointer. One of the reasons I was starting to look at more complex models of expression was to assess the potential of combining RT-PCR data with array data. I'm pretty sure the numbers emerging from these analyses will be in completely different spaces, and hence a model of expression would become pretty important. And coming from a discipline that suggests "model, don't normalize", one of the first questions to think about is my prior distribution. I'm starting to think, though, that this is not a common approach...

ADD REPLY • link 13.8 years ago by Mike Dewar ★ 1.6k

score 3 · Answer 1 · 2010-07-02

3

Entering edit mode

13.8 years ago

Nathan Harmston ★ 1.1k

To me the idea of generating a prior using one platform and using this as a prior for another is extremely bad. I would assume that the value you obtain from your probe is a obtained from a combination of the underlying real gene expression value and some error model ( [additive|multiplicative] [poisson|gaussian|log-normal] noise ). The error model for each platform will be different ....(ignoring experimental noise).

Whilst I don't like the gaussian because its easy argument ... sometimes it does work well and its pretty ok. Although a lot of people seem to suggest that gene expression data is poisson or even fat-tailed scale-free (Levy distribution)..... the Central Limit Theorem does not hold under this condition (as the variance > mean ).

HTH

ADD COMMENT • link 13.8 years ago by Nathan Harmston ★ 1.1k

0

Entering edit mode

Thanks for this - it's nice to have my worries confirmed! I guess I need to jump into the literature a bit more. I came across something that used a mixture model for expression - a uniform distribution over differentially expressed genes and a normal for those genes not differentially expressed. I guess this is a nice way of handling fat-tailed distributions when we're specifically looking at differential expression. Will chase this up...

ADD REPLY • link 13.8 years ago by Mike Dewar ★ 1.6k

Ram · Answer 2 · 2010-07-01

For example, is a Gaussian prior most appropriate for log2 transformed normalised oligo data? If so, is there a good reason for this?

I think the Gaussian prior may be as good as it gets without getting into a complex modeling. The sole (yet pretty compelling) reason to go with the Gaussian distribution is the Central Limit theorem although some of the conditions may not be satisfied.