Question

Variance Stabilizing Transformation on RNAseqs coming from different studies

0

Entering edit mode

3.4 years ago

Andrés Ribone ▴ 60

Hi!

I have a lot of public RNA-seqs from different experiments and genotypes sharing similar conditions/tissues. I want to build gene co-expressions networks out of them.

I did all the process to get the raw counts of every sample. I removed the samples with low counts (I rejected samples with less than 3 million reads in total), and the lowly expressed genes (only kept genes with more than 2 CPMs in at least 80% of samples). I ended up with a pretty OK matrix of >1200 samples x 15k genes, showing a nice bell curve when plus-one-log-transformed.

Now, the next steps are to normalize the counts and then try to deal with batch effects. I read that instead of log(), a better way of normalization is using a variance-stabilizing-transformation, and here start my doubts:

The VST should be applied over the raw counts, right? or the CPMs?!.
The VST should be applied to all the (not-filtered) samples together? Or should it be applied per-batch, or per-tissue at least?
Either way, if at some point later I reject samples for being outliers, I will have to repeat the VST without the rejected samples, right?

Thanks in advance!

RNA-Seq WGCNA VST • 1.3k views

ADD COMMENT • link updated 3.4 years ago by Kevin Blighe 87k • written 3.4 years ago by Andrés Ribone ▴ 60

score 0 · Answer 1 · 2020-11-24

0

Entering edit mode

3.4 years ago

Kevin Blighe 87k

Answered here: C: The RNA-Seq data input for WGCNA in terms of gene co-expression network construc

Kevin

ADD COMMENT • link 3.4 years ago by Kevin Blighe 87k