Filtering out lowly expressed genes
1
0
Entering edit mode
3.4 years ago
Bhumi • 0

Hi, I am wondering if there is a way to filter out lowly expressed genes (from small and RNA sequencing) using DESeq2's median of ratios normalized counts? OR any other normalization methods would work as well. More specifically, I would like to find a threshold to filter out lowly expressed genes from my dataset. It would be really great if you would please provide insights on this, please. Thanks, Regards, Bhumi

RNA-Seq R rna-seq next-gen • 2.9k views
ADD COMMENT
0
Entering edit mode
3.4 years ago
ATpoint 82k

You could read the DESeq2 manual addressing pre-filtering: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pre-filtering

Then there is the edgeR function filterByExpr() which will filter for samples with regard to the experimental design to keep if they have suffucient counts for a meaningful analysis. https://rdrr.io/bioc/edgeR/man/filterByExpr.html

What is the analysis goal?

ADD COMMENT
0
Entering edit mode

Thank you for your response. The goal of the analysis are: (i) to perform clustering-based off of the filtered counts (that is primarily why, I wanted to filter out samples with low read counts) (ii) to perform Differential Expression.

ADD REPLY
0
Entering edit mode

For i) you should select highly-variable genes or even better DEGs so those genes that actually have the power to separate samples and ii) it is not necessary to filter those, please check the DESeq2 manual. In fact for none of what you do there is need to filter counts, just feed the data into DESeq2, get DEGs and use them for clustering. Or select variable genes after applying vst() on the dds object, then use rowVars to select variable genes. I would go for the DEG approach.

ADD REPLY
0
Entering edit mode

Yes, I am aware studies perform DE and then clustering. However, if the phenotypes are not as clear (meaning there is no clear case vs. control scenario) then I was thinking to maybe try using the filtering approach. Cause then, here I can filter out the lowly expressed genes and then see how they cluster. Thank you though, you are right I could select the variable genes instead of filtering then.

ADD REPLY
1
Entering edit mode

I'd really use the variable genes after applying the variance-stabilizing transformation (vst) on the data, maybe the top500 or so. Using non-variable genes is meaningless as they do not any information to separate groups.

ADD REPLY
0
Entering edit mode

Yeah, that is something I have been struggling back and forth. (to choose from variable genes or filtering out low expressers). I am gonna go with variable gene approach. Thank you for sharing your insights. I really appreciate it.

ADD REPLY

Login before adding your answer.

Traffic: 2616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6