Is cor() enough to tell about relations between genes?
1
1
Entering edit mode
7.9 years ago
Ying ▴ 10

Hi,

I have 150 samples of expression data for two different conditions(or experiments).

can correlation (e.g. cor() in R) be enough to tell relations between random or pre-selected sets of genes?
Also what extra work required/better to do to validate it (only computationally) more or to go further?

NOTE: lets assume we pick only correlations between genes which p-vale <0.005 and cor > 0.70 or < -0.70 , if you also thing another correlation value is better, please tell.

Edit:
Data type: expression
Species: Human
Conditions samples: A = 90 samples , B = 60 samples , Total = 150
Genes: a set of desired genes
Aim: find correlation between those genes.

I appreciate your comments

RNA-Seq microarray statistics • 2.0k views
ADD COMMENT
0
Entering edit mode

could you specify which type of data ? expression ?

ADD REPLY
0
Entering edit mode

Yes expression data.

ADD REPLY
0
Entering edit mode

how many samples / genes / condition / what is the species ? and more important What is the hypothesis you want to test ? You should edit your question to add these informations.

ADD REPLY
0
Entering edit mode

Now I added some more info.

ADD REPLY
2
Entering edit mode

ok thanks. IMO you should try a hierarchical clustering on the genes. In R:

if A is your expression matrix (columns = samples ; row = genes)

h <- hclust(dist(A)) 
plot(h)

You could also try WGNCA : https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/

ADD REPLY
0
Entering edit mode

Yes WGCNA is a good way to find co-expressed genes and then plot them to see how they behave in two different conditions or to what level they are correlated but the correlation plot of even the dendrogram will only reveal correlation if those genes selected are either validated set of genes know in the lab or in published literature if you are using them. Randomly selecting genes for correlation might not work out depending on what criteria you are selecting them. Better to use some published data that gives some set of genes or try WGCNA as mentioned by NicoBxl

ADD REPLY
0
Entering edit mode

Thanks. I used hclust() and it gave me unexpected result.
e.g.
If geneX and geneY correlation value is 0.60 and p-value <0.005 , in hclust() geneX and Y would be more far from each compared to other less correlated genes. Any comment?
I am gonna use WGNCA too.

ADD REPLY
0
Entering edit mode

There is a difference between correlation and co-expression.

ADD REPLY
0
Entering edit mode

Then your suggestion of using WGCNA is for co-expression which is another purpose for me as it tracks genes which up and down together. While I want to do correlation plot for another purpose. Thanks.

ADD REPLY
0
Entering edit mode

You can use classical unsupervised clustering for your genes of interest on the normalized expression value between samples coming from both the conditions. This will help you to find the correlation coefficient and if these also cluster your samples in 2 classes category wise . So try that. Then take a look at this link here how WGCNA is used and can be exploited your hypothesis. See also here and you can take a look how to select specific gene modules as well.Take a look at different clustering methods as well.

ADD REPLY
0
Entering edit mode

Thank you. Lets say 2 clusters happen, how to do Correlation Coefficient for them? is it embedded inside the unsupervised method or u mean to do it separately?

ADD REPLY
0
Entering edit mode
7.9 years ago
LLTommy ★ 1.2k

So the answer to the original question is: No. (Do we agree on that?)

ADD COMMENT

Login before adding your answer.

Traffic: 2465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6