Question: How To Determine If A Gene Is Active From Expression Data
2
gravatar for Allpowerde
4.1 years ago by
Allpowerde1.1k
Allpowerde1.1k wrote:

I have RMA (Robust Multi-Array) scores for the different genes (and their isoforms) on the Affymetrix chip. I want to know which of these genes are "active" (or in other words: are likely to produce enough protein products to have an effect). I'm not interested in them being differentially expressed or X-fold over- or under-expressed. All I want is the classification of them being likely "on" or "off".

So far I log-transformed (basis 10) the RMA score and centered them (subtracted the median). I called all genes which had a transformed score <0 as being inactive and scores >0 as being active.

Does anyone have a better methodology ?

ADD COMMENTlink modified 3.1 years ago by Will4.0k • written 4.1 years ago by Allpowerde1.1k

I think it would help to elaborate on what the "produce enough protein to have an effect" means.

ADD REPLYlink written 4.1 years ago by Istvan Albert ♦♦ 39k

Sorry, I was to vague here. I am looking at the effects of a certain set of transcription factors in a certain tissue. There seem to be some interesting patterns of co-operation between them. Whether these TFs are able to interact in the first place depends on whether all of them are actually expressed in this tissue. That's what I want to find out with this exercise. -- Thanks for your help !

ADD REPLYlink written 4.1 years ago by Allpowerde1.1k
7
gravatar for Nicojo
4.1 years ago by
Nicojo1.0k
Kyoto, Japan
Nicojo1.0k wrote:

I would suggest the following question instead of the one you're asking:

Can you actually determine if a gene is "active" (i.e. translated into protein) from [gene] expression data?

And I'll point you towards people who have published papers about it:

These are just a few papers that seem critical towards such a correlation. That is not to say that there is no good correlation for any gene. But I would be very surprised if you can make a general rule about it without checking in every cell type, tissue type and for every gene to see if such a correlation is or not acceptable.

Now, if you do a Pubmed search for the terms "correlation mRNA protein", you will find many papers that check for such correlations, but mostly for specific genes in specific tissues (often for cancer diagnostics purposes).

If you do find papers that state such correlations, genome wide using microarray data, I'd be highly suspicious of that paper.

So, obviously, you can not set "a" cut-off for determining this. My personal experience tells me that you can have gene transcription with no protein expression following it... Unfortunately, I have not published it yet :(

ADD COMMENTlink written 4.1 years ago by Nicojo1.0k

Thanks for this detailed reply! Those are really great references you pointed me to !

ADD REPLYlink written 4.1 years ago by Allpowerde1.1k

Thanks for this detailed reply! Those are really great references you pointed me to! However, determining how much proteins are actually produced from the transcribed mRNA is going into too much detail for this project.

ADD REPLYlink written 4.1 years ago by Allpowerde1.1k

The point is that no amount of mRNA will tell you if the protein is present and in what amount... And even less if there is a biological impact by the proteins produced.

ADD REPLYlink written 4.1 years ago by Nicojo1.0k

I agree, theis statement in my question was quite confusing. What I'm after is just a rough classification for the proteins in "probably there" or not. (I'm looking forward to reading the publication you hinted at)

ADD REPLYlink written 4.1 years ago by Allpowerde1.1k

Ahh I'm struggling with quite a few things more urgent. I'm afraid it might stay on the shelf for a while (I hope not forever though)... Wet lab can be EXTREMELY frustrating :(

ADD REPLYlink written 4.1 years ago by Nicojo1.0k
4
gravatar for Chris Miller
4.1 years ago by
Chris Miller12k
St. Louis, MO
Chris Miller12k wrote:

You're right in thinking that your methodology isn't a very good representation of the system. mRNAs (and their protein products) have a huge dynamic range. Some are going to be expressed constantly at extremely low levels, and at the other extremes, you'll have genes that are highly expressed, but only for a short period of time. Taking the median level as the dividing line between on and off is going to give you huge numbers of false negatives (genes that are actually being transcribed and translated, but that you'll classify as "off")

I'd look at what the background noise level is, then run some stats to determine which probes give you signal significantly above that level. Any gene meeting that criteria should probably be considered "on". I suspect that may not divide the set as nicely as you'd hope, though.

Maybe if you tell us more about what exactly you're trying to do, we can offer more constructive advice.

ADD COMMENTlink written 4.1 years ago by Chris Miller12k

I agree, using the background level as noise and using that as not-expressed at all sounds like a good approach

ADD REPLYlink written 4.1 years ago by Istvan Albert ♦♦ 39k

That sounds like the approach I'm after. How would I determine the noise level thought?

ADD REPLYlink written 4.1 years ago by Allpowerde1.1k
0
gravatar for Will
4.1 years ago by
Will4.0k
Will4.0k wrote:

Sounds like your trying to find genes which actually switch from on-to-off (or vice-versa) based on cell-type, condition, etc. Not all genes have this type of behavior ... some are graded (like a dimer switch). There are numerous papers that discuss techniques for finding genes which have "bi-modal" expression patterns. Since they are a mixture of two expressions patterns it is likely that they have "on" and "off" pattern.

This article explain the technique and includes Matlab code that should do the whole thing for you.

Human and mouse switch-like genes share common transcriptional regulatory mechanisms for bimodality.

ADD COMMENTlink written 4.1 years ago by Will4.0k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 332 users visited in the last hour