Question: will different proportion of control/patient samples affect gene's Pearson correlation?
gravatar for hellocita
3.0 years ago by
hellocita20 wrote:

I have rna-seq data that were from different ages(10, 20, 30, 40, 50 year-old) in 50 control and 14 patients. And based on differential analysis I found some differential genes across age. I want to divide genes into several cluster by using their pearson correlation r for hierarchical clustering, and in each cluster genes should have similar pattern across age, for instance, in control, genes in a cluster were highest at young age, while in patient, it's highest in old ages.

however, there is only a few samples at young ages, and patient sample size is much less than control. I find if I first calculate the mean of each age both in control and in patient, and do clustering based on gene's correlation, the pearson r is different from clustering based on gene's correlation from all samples. will the different size of control and patients, and different size of ages affect the correctness of pearson correlation?

ADD COMMENTlink modified 3.0 years ago by Kevin Blighe69k • written 3.0 years ago by hellocita20
gravatar for Kevin Blighe
3.0 years ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Hello Lucy, I do not completely understand your final paragraph. However, differences in sample numbers will definitely affect the correlation statistic.

If you are aiming to look for 'patterns' in the age groups based on correlation, then tools already exist. These involve the construction of a square correlation matrix, which is then used as the founding stone for network analysis. In a square correlation matrix, each sample is correlated to every other sample:

ADD COMMENTlink written 3.0 years ago by Kevin Blighe69k

Thank you Kevin! However i am not sure whether I can use WGCNA, because it may be first calculate gene module by correlation based on control sample, so I think it may not reflect what really happened in disease sample, disease sample module should be different from control module I think.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by hellocita20

Okay, why not generate one network for controls and the other for disease? Network analysis, generally, has major flaws. I believe that it still has to prove its value as a robust method that can help us to disentangle disease mechanisms.

ADD REPLYlink written 3.0 years ago by Kevin Blighe69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2220 users visited in the last hour