Variance Stabilizing Transformation on RNAseqs coming from different studies
1
0
Entering edit mode
3.4 years ago

Hi!

I have a lot of public RNA-seqs from different experiments and genotypes sharing similar conditions/tissues. I want to build gene co-expressions networks out of them.

I did all the process to get the raw counts of every sample. I removed the samples with low counts (I rejected samples with less than 3 million reads in total), and the lowly expressed genes (only kept genes with more than 2 CPMs in at least 80% of samples). I ended up with a pretty OK matrix of >1200 samples x 15k genes, showing a nice bell curve when plus-one-log-transformed.

Now, the next steps are to normalize the counts and then try to deal with batch effects. I read that instead of log(), a better way of normalization is using a variance-stabilizing-transformation, and here start my doubts:

  • The VST should be applied over the raw counts, right? or the CPMs?!.

  • The VST should be applied to all the (not-filtered) samples together? Or should it be applied per-batch, or per-tissue at least?

  • Either way, if at some point later I reject samples for being outliers, I will have to repeat the VST without the rejected samples, right?

Thanks in advance!

RNA-Seq WGCNA VST • 1.3k views
ADD COMMENT
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 3133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6