5.6 years ago by
We usually go for the following procedure to choose genes that are expressed highly enough to analyse. Although it is arbitrary, in the sense that WE choose the method and threshold, not that we choose some random method, it is easy to explain and logical so it never gave us problems during reviews. It works if you have blank spots or negative controls on your array.
- We calculate the average and standard deviation of the expression value of all the blanks and negative controls on one array (we do it separately for both colours in 2-colour arrays).
- We define a threshold for each array (or colour) that is equal to the mean expression of the blanks plus two times the standard deviation (threshold = mean + 2*stdev) (assuming these do not deviate too much from a normal deviation, it means that the threshold will keep spots that are outside ~95% of the distribution of the blanks)
- We keep a gene if it is above the threshold on more than 80% of the samples of at least one of the biological groups we are testing.
So, let's say you have 4 groups, if at least 1 group has a spot above the mean + 2 time the stdev for 80% or more of the samples, we retain it in the analysis.
Well, maybe not THAT simple to explain, but easy to understand and kind of logical. Plus, it retains spots that may be not expressed in one group but that are expressed in another one. These genes are especially interesting since they could very well be differentially expressed. Other methods may miss these genes.