Question: How to calculate correlation of expressed gene for a complex experiment
gravatar for o.tutar
4.3 years ago by
o.tutar20 wrote:


I have two plant species and for one of them, there are two different sampling area. Lets say S1-1, S1-2 and S2. There are four replicates for each of them and there are control (C) and treatment (T) samples. Which means I have C-1,C-2,C-3,C4,T1,T2,T3,T4 for each species (S1-1, S1-2 and S2). I have gene expression measurements for three-time points (beginning of treatment, end of treatment and after recovery period) and I have 14 genes.

I want to understand if some genes are correlated with each other during treatment. For example there are some group of genes which has a role on detoxification and some repairs misfolded proteins. So under a stress condition I expect some of them regulated in similar way and I want to test this. When you have an idea about which genes are correlated with others how do you test this? I could not decide how to proceed. Should I consider only treatment group by using dCt values (difference from reference gene) or should I use all data with ddCt values?

Note: I know I can see it visually using PCA and heatmap, but I want to learn also how to calculate correlation for this type of experiment. dCt is the difference of target genes' expression value from reference genes ddCt is the difference of control from treatment group by using averaged dCt values of replicates


correlation gene expression • 1.9k views
ADD COMMENTlink modified 2.6 years ago by Biostar ♦♦ 20 • written 4.3 years ago by o.tutar20

It depends on what you want to compare e.g. S1 vs S2 or C vs T and on how you want to treat the sampling area. Do you consider that the sampling area affects the treatment ? If no, then the different sampling areas can be considered as replicates. Otherwise, the sampling area is a parameter and in effect you have three treatments T-1, T-2 and T.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Jean-Karim Heriche23k

Hello, Yes sampling area affects the results, actually it is the question because ddCt value already gives me the difference from control. So I have two option, I can use dCt value which includes control and treatment values or I can use ddCt values which is fold expression change value, which gives me only one value for each gene for each treatment. If I use dCt then there is no reason to do anything with control group, because first I am interested in how treatment affected gene expression and second since in correlation it takes a mean value (maybe I am wrong, please correct me) including control won't reply my question. On the other hand, if I use ddCt value should I include all species? or simply do it for each species, but in this case I have three treatment so how healthy will be the result?

ADD REPLYlink written 4.3 years ago by o.tutar20
gravatar for Benn
4.3 years ago by
Benn8.0k wrote:

Assuming you are using R... Where x are the expression values in species 1 and y are the values of species 2.

cor(x, y, method="pearson")

And if your data are in 2 matrices (so your genes in rows), this also works!

ADD COMMENTlink written 4.3 years ago by Benn8.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1534 users visited in the last hour