Question

Filtering out lowly expressed genes

0

Entering edit mode

3.4 years ago

Bhumi • 0

Hi, I am wondering if there is a way to filter out lowly expressed genes (from small and RNA sequencing) using DESeq2's median of ratios normalized counts? OR any other normalization methods would work as well. More specifically, I would like to find a threshold to filter out lowly expressed genes from my dataset. It would be really great if you would please provide insights on this, please. Thanks, Regards, Bhumi

RNA-Seq R rna-seq next-gen • 2.9k views

ADD COMMENT • link updated 3.4 years ago by ATpoint 82k • written 3.4 years ago by Bhumi • 0

score 0 · Answer 1 · 2020-12-07

0

Entering edit mode

3.4 years ago

ATpoint 82k

You could read the DESeq2 manual addressing pre-filtering: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pre-filtering

Then there is the edgeR function filterByExpr() which will filter for samples with regard to the experimental design to keep if they have suffucient counts for a meaningful analysis. https://rdrr.io/bioc/edgeR/man/filterByExpr.html

What is the analysis goal?

ADD COMMENT • link 3.4 years ago by ATpoint 82k

0

Entering edit mode

Thank you for your response. The goal of the analysis are: (i) to perform clustering-based off of the filtered counts (that is primarily why, I wanted to filter out samples with low read counts) (ii) to perform Differential Expression.

ADD REPLY • link 3.4 years ago by Bhumi • 0

0

Entering edit mode

For i) you should select highly-variable genes or even better DEGs so those genes that actually have the power to separate samples and ii) it is not necessary to filter those, please check the DESeq2 manual. In fact for none of what you do there is need to filter counts, just feed the data into DESeq2, get DEGs and use them for clustering. Or select variable genes after applying vst() on the dds object, then use rowVars to select variable genes. I would go for the DEG approach.

ADD REPLY • link 3.4 years ago by ATpoint 82k

0

Entering edit mode

Yes, I am aware studies perform DE and then clustering. However, if the phenotypes are not as clear (meaning there is no clear case vs. control scenario) then I was thinking to maybe try using the filtering approach. Cause then, here I can filter out the lowly expressed genes and then see how they cluster. Thank you though, you are right I could select the variable genes instead of filtering then.

ADD REPLY • link 3.4 years ago by Bhumi • 0

1

Entering edit mode

I'd really use the variable genes after applying the variance-stabilizing transformation (vst) on the data, maybe the top500 or so. Using non-variable genes is meaningless as they do not any information to separate groups.

ADD REPLY • link 3.4 years ago by ATpoint 82k

0

Entering edit mode

Yeah, that is something I have been struggling back and forth. (to choose from variable genes or filtering out low expressers). I am gonna go with variable gene approach. Thank you for sharing your insights. I really appreciate it.

ADD REPLY • link 3.4 years ago by Bhumi • 0