Question: will different proportion of control/patient samples affect gene's Pearson correlation?
0
gravatar for hellocita
19 months ago by
hellocita20
hellocita20 wrote:

I have rna-seq data that were from different ages(10, 20, 30, 40, 50 year-old) in 50 control and 14 patients. And based on differential analysis I found some differential genes across age. I want to divide genes into several cluster by using their pearson correlation r for hierarchical clustering, and in each cluster genes should have similar pattern across age, for instance, in control, genes in a cluster were highest at young age, while in patient, it's highest in old ages.

however, there is only a few samples at young ages, and patient sample size is much less than control. I find if I first calculate the mean of each age both in control and in patient, and do clustering based on gene's correlation, the pearson r is different from clustering based on gene's correlation from all samples. will the different size of control and patients, and different size of ages affect the correctness of pearson correlation?

ADD COMMENTlink modified 19 months ago by Kevin Blighe48k • written 19 months ago by hellocita20
1
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe48k
Kevin Blighe48k wrote:

Hello Lucy, I do not completely understand your final paragraph. However, differences in sample numbers will definitely affect the correlation statistic.

If you are aiming to look for 'patterns' in the age groups based on correlation, then tools already exist. These involve the construction of a square correlation matrix, which is then used as the founding stone for network analysis. In a square correlation matrix, each sample is correlated to every other sample:

ADD COMMENTlink written 19 months ago by Kevin Blighe48k

Thank you Kevin! However i am not sure whether I can use WGCNA, because it may be first calculate gene module by correlation based on control sample, so I think it may not reflect what really happened in disease sample, disease sample module should be different from control module I think.

ADD REPLYlink modified 19 months ago • written 19 months ago by hellocita20
1

Okay, why not generate one network for controls and the other for disease? Network analysis, generally, has major flaws. I believe that it still has to prove its value as a robust method that can help us to disentangle disease mechanisms.

ADD REPLYlink written 19 months ago by Kevin Blighe48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 859 users visited in the last hour