Question: How to identify non-expressed genes in microarray?
gravatar for bounlu
3.6 years ago by
bounlu170 wrote:

In the output of a microarray experiment, you get certain level of signal for each probe, even after the correction for background intensity and processing, and genes are interpreted as expressed as long as there is a value there.

I know optimally you need to perform RNA-Seq to assess non-expressed genes, microarrays are not designed for this, but is it possible anyway? Has anybody encountered this challenge? Which are the thresholds used to distinguish non-expressed genes than lowly expressed genes?

microarray gene expression • 2.0k views
ADD COMMENTlink modified 3.6 years ago by svlachavas560 • written 3.6 years ago by bounlu170
gravatar for svlachavas
3.6 years ago by
svlachavas560 wrote:

Dear Bounlu,

the question you pinpoint above is very crusial but also very general. For instance, regarding the field of microarrays, there are numerous kind of filterings, like initial non-specific filtering on intensity, on variance, on a detection p-value threshold that can provided directly by some platforms(i.e Illumina). Generally, the basic idea is to filter probesets-genes that are "characterized" as absent or not expressed based on one metric you used on most of your samples, or conditions (assuming "naively" that in most cases, the majority of genes are not expressed in the analyzed tissue-etc). Also, it is highly dependend on the specific kind of platform/technology used (Affymetrix,Illumina..).

On the other hand, RNA-seq is a whole different field, with its own experimental design and theory, but also "some similarities and aspects" regarding some general methodologies. Very naively, you can check from limma users guide ( on page 119, that a kind of filtering can be performed on the number of "total counts".

Finally, one last important aspect that i would like to emphasize, is the following: both in the literature and also in many bioinformatics groups, variance filtering is not recommended both in RNA-seq and microarrays: that is when you indend to use some of the most reliable DE methodologies(limma, edgeR, etc.) or when in your data there is a decreasing mean-variance relationship.

You could check also this very useful article (

Hope it helps



ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by svlachavas560

Thanks for the explanation. More specifically, how can you assess non-expressed genes in the Level 3 microarray expression data from TCGA? Since the data is already processed there, I think the steps you mention on filtering are already skipped. As all genes have certain level of expression value, should I assume that all of them are expressed?

ADD REPLYlink written 3.6 years ago by bounlu170

I have never dealt so far with TCGA data(although i had some projects which i have not started), but as far as i have searched and read, level3 refers to post-normalization--but im not sure that in this level any kind of non-specific filtering has performed !! did you have any knowledge or information about this ? anyway, you should definately perform some density or histogram plots and inspect the range of the probeset expression values across your samples. If you see high bimodal distributions(for instance a high peak at low intensity values) then possibly no-filtering has been performed. Moreover, which specific microarray platform are you using ??

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by svlachavas560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1958 users visited in the last hour