9.5 years ago by
Any fold-change threshold would be completely arbitrary. What I would normally do in a situation like this is to first rank all genes based on their (absolute log) fold-change. I would then look at the top ranking genes and verify that these indeed make sense.
Next, I would try to get hold of some sort of gene set that I can use to more systematically assess if the ranking makes sense. For example, if the array study has to do with a certain process, I would try to get hold of set of genes that are known/believed to be involved in the process. Depending on the experiment, this list could come from a pathway database like KEGG, a disease database like OMIM, or from text mining of PubMed. Once I have such a reference set, I can plot sensitivity (i.e. the fraction of genes from the set that are found) as function of the rank of the gene.
Why would I make such a plot? Firstly, it allows me to assess if one way of ranking the genes is better than another; maybe ranking the expression profiles based on something else than the maximal fold-change is better. Secondly, and directly related to your question, it allows me to select a cutoff that is not completely arbitrary: by looking at the curve, you can tell how long a list of "differentially expressed" genes it makes sense to propose based on the experiment.