Question: RNA co-expression: shall I use differential co-expression or not?
gravatar for sandrine.muller.research
3.5 years ago by


I am new in bioinformatics and have a background in neuroimaging where we often use a baseline to build our models. Meaning, all the relationships between variables (correlation or other measurements) are inferred from the difference of activity between or condition of interest and in the baseline. Although these types of differential models are gold standard in my field, I've heard that studies of differential co-expression in RNA-seq is controversed. Does anyone can explain me why (difficulty of choice of a baseline...etc) and/or point me to publications that discuss the topic?

Thank you very much!


rna-seq co-expression • 1.4k views
ADD COMMENTlink modified 3.3 years ago by Kevin Blighe69k • written 3.5 years ago by sandrine.muller.research30

Hi! Today I was reading on co-expression networks, but it really depends on how your experiment is and what tools you have at your disposal. For example, seems like WGCNA works really good, however, seems you need quite the number of samples to run a significant analysis. On using differentially expressed genes, here is what is wrote on their FAQ:

WGCNA is designed to be an unsupervised analysis method that clusters genes based on their expression profiles. Filtering genes by differential expression will lead to a set of correlated genes that will essentially form a single (or a few highly correlated) modules. It also completely invalidates the scale-free topology assumption, so choosing soft thresholding power by scale-free topology fit will fail.

I do not know if this is the case for all the tools, but it is definitely something to keep in mind. Cheers :)

ADD REPLYlink written 3.5 years ago by biofalconch470

Thank you @biofalconch for your answer! Indeed, I can understand their point. However, don't you think that you may have a lot of correlations that happen "by chance" if you are not controlling for random noise (from a baseline) ? I guess when you correlate the modules with a disease for instance, a lot of the genes in the module can be false positives... or am I having an inadequate reasonning?

ADD REPLYlink written 3.5 years ago by sandrine.muller.research30

Yes! It may be bad to leave the whole dataset, and the first part of the same question of the FAQ adresses this (probably shouldn't have left it out). But here it is

Probesets or genes may be filtered by mean expression or variance (or their robust analogs such as median and median absolute deviation, MAD) since low-expressed or non-varying genes usually represent noise. Whether it is better to filter by mean expression or variance is a matter of debate; both have advantages and disadvantages, but more importantly, they tend to filter out similar sets of genes since mean and variance are usually related.

So what I got from this is "filter at your own risk"

ADD REPLYlink written 3.5 years ago by biofalconch470

WGCNA is indeed fundamentally based on correlation - that's how it initially identifies modules. Once identified, it then transforms the module by single value decomposition (i.e. PCA) in order to derive the loadings for each gene to each module. WGCNA is really great in certain situations.

ADD REPLYlink modified 2.2 years ago • written 3.3 years ago by Kevin Blighe69k
gravatar for Kevin Blighe
3.3 years ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Just to throw another couple of ideas out there.

With simple correlation analyses, like generating a huge correlation matrix for all of your variables, you can also derive a P value from the correlation function (in R at least) in order to back up whatever values you obtain. Through this, you can also plot the values and identify 'structure' in your dataset, as you can see from my first figure below.

Another thing that I've recently been researching this past year has been graph theory, minimal spanning trees, and the identification of 'communities' in these. There are functions in R for this in the packages igraph and plotrix. he data is the same as per the correlation matrix. In the correlation plot below, for example, you can see 'blocks' of highly positively and negatively (inversed) correlated samples - these are akin to modules and communities in a network analysis.


ADD COMMENTlink modified 16 months ago • written 3.3 years ago by Kevin Blighe69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2164 users visited in the last hour