the question you pinpoint above is very crusial but also very general. For instance, regarding the field of microarrays, there are numerous kind of filterings, like initial non-specific filtering on intensity, on variance, on a detection p-value threshold that can provided directly by some platforms(i.e Illumina). Generally, the basic idea is to filter probesets-genes that are "characterized" as absent or not expressed based on one metric you used on most of your samples, or conditions (assuming "naively" that in most cases, the majority of genes are not expressed in the analyzed tissue-etc). Also, it is highly dependend on the specific kind of platform/technology used (Affymetrix,Illumina..).
On the other hand, RNA-seq is a whole different field, with its own experimental design and theory, but also "some similarities and aspects" regarding some general methodologies. Very naively, you can check from limma users guide (https://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf) on page 119, that a kind of filtering can be performed on the number of "total counts".
Finally, one last important aspect that i would like to emphasize, is the following: both in the literature and also in many bioinformatics groups, variance filtering is not recommended both in RNA-seq and microarrays: that is when you indend to use some of the most reliable DE methodologies(limma, edgeR, etc.) or when in your data there is a decreasing mean-variance relationship.
You could check also this very useful article (http://www.ncbi.nlm.nih.gov/pubmed/20460310)
Hope it helps
modified 3.6 years ago
3.6 years ago by
svlachavas • 560