Question

How to calculate correlation of expressed gene for a complex experiment

0

Entering edit mode

7.9 years ago

o.tutar ▴ 20

Hello,

I have two plant species and for one of them, there are two different sampling area. Lets say S1-1, S1-2 and S2. There are four replicates for each of them and there are control (C) and treatment (T) samples. Which means I have C-1,C-2,C-3,C4,T1,T2,T3,T4 for each species (S1-1, S1-2 and S2). I have gene expression measurements for three-time points (beginning of treatment, end of treatment and after recovery period) and I have 14 genes.

I want to understand if some genes are correlated with each other during treatment. For example there are some group of genes which has a role on detoxification and some repairs misfolded proteins. So under a stress condition I expect some of them regulated in similar way and I want to test this. When you have an idea about which genes are correlated with others how do you test this? I could not decide how to proceed. Should I consider only treatment group by using dCt values (difference from reference gene) or should I use all data with ddCt values?

Note: I know I can see it visually using PCA and heatmap, but I want to learn also how to calculate correlation for this type of experiment. dCt is the difference of target genes' expression value from reference genes ddCt is the difference of control from treatment group by using averaged dCt values of replicates

Regards

correlation gene expression • 2.6k views

ADD COMMENT • link updated 6.2 years ago by Biostar 20 • written 7.9 years ago by o.tutar ▴ 20

0

Entering edit mode

It depends on what you want to compare e.g. S1 vs S2 or C vs T and on how you want to treat the sampling area. Do you consider that the sampling area affects the treatment ? If no, then the different sampling areas can be considered as replicates. Otherwise, the sampling area is a parameter and in effect you have three treatments T-1, T-2 and T.

ADD REPLY • link 7.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hello, Yes sampling area affects the results, actually it is the question because ddCt value already gives me the difference from control. So I have two option, I can use dCt value which includes control and treatment values or I can use ddCt values which is fold expression change value, which gives me only one value for each gene for each treatment. If I use dCt then there is no reason to do anything with control group, because first I am interested in how treatment affected gene expression and second since in correlation it takes a mean value (maybe I am wrong, please correct me) including control won't reply my question. On the other hand, if I use ddCt value should I include all species? or simply do it for each species, but in this case I have three treatment so how healthy will be the result?

ADD REPLY • link 7.9 years ago by o.tutar ▴ 20

score 1 · Answer 1 · 2016-05-30

1

Entering edit mode

7.9 years ago

Benn 8.3k

Assuming you are using R... Where x are the expression values in species 1 and y are the values of species 2.

cor(x, y, method="pearson")

And if your data are in 2 matrices (so your genes in rows), this also works!

ADD COMMENT • link 7.9 years ago by Benn 8.3k