Question

Filtering low variance genes for WGCNA

0

Entering edit mode

3.3 years ago

pennakiza ▴ 60

Hello everyone,

I have a question re filtering for low variance prior to WGCNA. I have got RNASeq data, pre-filtered for low counts and transformed with DESeq2 vst. I was wondering if you could help me select from the two methods below the one that is more correct for my data.

filter <- function(x)(IQR(x, na.rm=T)>0.25)
filtered_genes <- genefilter(df,filter)
df_filtered<-df[filtered_genes,]

or

data$variance = apply(data, 1, var)
data = data[data$variance >= quantile(data$variance, c(0.25)), ]
data$variance <- NULL

Thank you very much for your help!

Penny

WGCNA genefilter IQR variance filtering • 3.5k views

ADD COMMENT • link updated 3.3 years ago by Kevin Blighe 87k • written 3.3 years ago by pennakiza ▴ 60

score 4 · Answer 1 · 2020-12-28

Hi Penny,

In my opinion, if you have already produced variance-stabilised expression levels via the standard DESeq2 workflow, then no additional filtering for variance should be performed. DESeq2 specifically tackles the issue of low and high variability and its relationship with mean expression; so, you can assume that the problem has already been managed via the DESeq2 normalisation and transformation (VST)..

According to the WGCNA developers:

Should I filter probesets or genes? Probesets or genes may be filtered by mean expression or variance (or their robust analogs such as median and median absolute deviation, MAD) since low-expressed or non-varying genes usually represent noise. Whether it is better to filter by mean expression or variance is a matter of debate; both have advantages and disadvantages, but more importantly, they tend to filter out similar sets of genes since mean and variance are usually related.

[source: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html]

Kevin