Question: will different proportion of control/patient samples affect gene's Pearson correlation?
gravatar for hellocita
14 months ago by
hellocita10 wrote:

I have rna-seq data that were from different ages(10, 20, 30, 40, 50 year-old) in 50 control and 14 patients. And based on differential analysis I found some differential genes across age. I want to divide genes into several cluster by using their pearson correlation r for hierarchical clustering, and in each cluster genes should have similar pattern across age, for instance, in control, genes in a cluster were highest at young age, while in patient, it's highest in old ages.

however, there is only a few samples at young ages, and patient sample size is much less than control. I find if I first calculate the mean of each age both in control and in patient, and do clustering based on gene's correlation, the pearson r is different from clustering based on gene's correlation from all samples. will the different size of control and patients, and different size of ages affect the correctness of pearson correlation?

ADD COMMENTlink modified 14 months ago by Kevin Blighe41k • written 14 months ago by hellocita10
gravatar for Kevin Blighe
14 months ago by
Kevin Blighe41k
London, England
Kevin Blighe41k wrote:

Hello Lucy, I do not completely understand your final paragraph. However, differences in sample numbers will definitely affect the correlation statistic.

If you are aiming to look for 'patterns' in the age groups based on correlation, then tools already exist. These involve the construction of a square correlation matrix, which is then used as the founding stone for network analysis. In a square correlation matrix, each sample is correlated to every other sample:

ADD COMMENTlink written 14 months ago by Kevin Blighe41k

Thank you Kevin! However i am not sure whether I can use WGCNA, because it may be first calculate gene module by correlation based on control sample, so I think it may not reflect what really happened in disease sample, disease sample module should be different from control module I think.

ADD REPLYlink modified 14 months ago • written 14 months ago by hellocita10

Okay, why not generate one network for controls and the other for disease? Network analysis, generally, has major flaws. I believe that it still has to prove its value as a robust method that can help us to disentangle disease mechanisms.

ADD REPLYlink written 14 months ago by Kevin Blighe41k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 751 users visited in the last hour