Hi, I am wondering if there is a way to filter out lowly expressed genes (from small and RNA sequencing) using DESeq2's median of ratios normalized counts? OR any other normalization methods would work as well. More specifically, I would like to find a threshold to filter out lowly expressed genes from my dataset. It would be really great if you would please provide insights on this, please. Thanks, Regards, Bhumi
Thank you for your response. The goal of the analysis are: (i) to perform clustering-based off of the filtered counts (that is primarily why, I wanted to filter out samples with low read counts) (ii) to perform Differential Expression.
For i) you should select highly-variable genes or even better DEGs so those genes that actually have the power to separate samples and ii) it is not necessary to filter those, please check the DESeq2 manual. In fact for none of what you do there is need to filter counts, just feed the data into DESeq2, get DEGs and use them for clustering. Or select variable genes after applying
vst()
on thedds
object, then userowVars
to select variable genes. I would go for the DEG approach.Yes, I am aware studies perform DE and then clustering. However, if the phenotypes are not as clear (meaning there is no clear case vs. control scenario) then I was thinking to maybe try using the filtering approach. Cause then, here I can filter out the lowly expressed genes and then see how they cluster. Thank you though, you are right I could select the variable genes instead of filtering then.
I'd really use the variable genes after applying the variance-stabilizing transformation (
vst
) on the data, maybe the top500 or so. Using non-variable genes is meaningless as they do not any information to separate groups.Yeah, that is something I have been struggling back and forth. (to choose from variable genes or filtering out low expressers). I am gonna go with variable gene approach. Thank you for sharing your insights. I really appreciate it.