I am performing microarray data analysis and I have a set of DEGs using a adjusted pvalue < 0.05. I am getting a total of 152 DEGs. Now I am using different threshold for the log fold change cutoffs. If I am using a log FC of 1, I am getting only 18 DEGs. But if I am including all DEGs (without any fold change cutoff), then after doing functional analysis, I am getting much more interesting biological process terms and pathway terms that have high correlation with the disease. Should I go with a fold change cutoff at all? I wouldn't be able to say which genes are up/down regulated in that case.
There is no strict rule. The default is a cutoff of zero which means that the statistical framework tests against this cutoff as null hypothesis. The question is if small fold changes (even though statistically significant) are meaningful in a biological context aka does a fold change of e.g. 1.1 (on linear scale) have any biological effect.
Often people apply filters on the results table such as
FC > | 1.5 | to focus on what they believe are the biologically-meaningful changes. The problem is that significance of results is also in part a function of the replicate numbers. Smaller fold changes (given the replicates are comparable) will become significant at larger sample size. At large n small FCs will become significant but the biological impact is questionable if observing like FCs of 5%.
A data-driven alternative would be to specifically test against a certain fold change (so a user-defined null hypothesis) which is what e.g. glmTreat from
edgeR does. The
DESeq2 analogon is I think the
lfcThreshold parameter in the results function, see here the manual. If
limma offers that for arrays I cannot say. From what I understand this might particularily be useful if one has plenty of significant genes (thousands) and wants a data-driven way to reduce this number to the (probably) most meaningful candidates. This approach from what I understand requires greater statistical power and might not be suited for small sample sizes with modest effects.
In your case, given you have only few candidates, I would probably take all 152 genes and proceed with the analysis. Any conclusions you make from any NGS experiment should (imho) anyway be confirmed by an independent approach, be it other experiments by yourself or by showing similar results from published and reasonably-related data you reanalyzed after downloading from NCBI.