how to create correlation matrix R
1
0
Entering edit mode
13 months ago
sata72 • 0

I have a list of genes with two meth and exp parameter for each genes (with several sample for exp and meth). i want to create table with "Pearson" correlation matrix with p value.

This is a example of the list and the numbers are random selected.

Input:

list    meth1   meth2   meth3   meth4   meth5   meth6   meth7   meth8   meth9   meth10  exp1    exp2    exp3    exp4    exp5    exp6    exp7    exp8    exp9    exp10
gene 1  1   1   1   1   1   1   1   1   1   1   7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2 7.2
gene 2  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5
gene 3  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 5   5   5   5   5   5   5   5   5   5
gene 4  0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 4   4   4   4   4   4   4   4   4   4
gene 5  0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 3   3   3   3   3   3   3   3   3   3
gene 6  0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 2   2   2   2   2   2   2   2   2   2

....


Expected output:

list      Correlation value  Pvalue
gene 1      0.3              0.05
gene 2      0.8              0.04
gene 3      0.9              0.06
gene 4      0.2              0.01
gene 5     -0.4              0.002
gene 6      0.2              0.05
....

pearson-correlation R • 1.4k views
0
Entering edit mode

Your question is not clear, you want to draw correlation between the meth and exp values for each gene right? then the example you provided does not make sense since for all genes the values are constant anyways. Anyways, in R you can simply use cor() function to do the correlation, and it has a "method" argument which you can use to specify your method of choice like "Pearson", "Spearman" etc.

0
Entering edit mode

Thank you for response actually i want to find genes that highly correlated and statistically significant with exp and meth values. cor() calculate the overall correlation between meth and exp.

0
Entering edit mode

for each gene i have information of 10 sample for meth and 10 sample for exp. i want to find correlation of these meth samples and exp samples. and compare all of the genes to find winch one have high correlation and also significant.

0
Entering edit mode

So you want to do differential expression analysis.

0
Entering edit mode

no. i don't want to find genes with differently exp, i want to find genes with high correlated between exp (10sample) and meth (10sample).

0
Entering edit mode

If i want to get cor and pvalue for the gene 1 i can use this:

x<-list[1,1:10]
y<-list[1,11:20]

corr <- cor.test(x, y, method = "spearman")


in the output

corr

p-value = 0.9337

rho = 0.01932637


so i can compare genes to see which genes are high correlated (exp, meth). but i want to create a table in the output with all of the genes.

1
Entering edit mode
13 months ago
seidel 11k

You can use the apply() function to process your data table:

# generate some data
df <- as.data.frame(matrix(runif(200,0,10), nrow=10, ncol=20))
colnames(df) <- c(paste0("meth",1:10), paste0("exp",1:10))
rownames(df) <- paste0("g", 1:10)

# use apply to process each row
x <- apply(df, 1, function(x){
correl <- cor.test(x[1:10], x[11:20], method="pearson")
return(c(correl$estimate, correl$p.value))
})

rownames(x) <- c("pearson", "p.value")

# transpose rows to columns
t(x)


Result:

          pearson   p.value
g1   3.653509e-01 0.2992016
g2  -7.154713e-05 0.9998435
g3  -4.438671e-01 0.1987847
g4  -4.147991e-01 0.2332911
g5  -1.803566e-01 0.6180548
g6   1.624743e-01 0.6538220
g7   3.031253e-01 0.3945555
g8  -2.718013e-01 0.4474460
g9  -4.341393e-01 0.2099798
g10 -1.464653e-01 0.6863923

0
Entering edit mode

great! thanks