How to calculate the Pearson correlation coefficients between a list of genes?
1
4
Entering edit mode
5.6 years ago
Chaimaa ▴ 260

Hello guys,

I have a list of around 300 genes and, I want to calculate gene co-expression value via the Pearson correlation coefficient (PCC) between these genes and return only the ones have larger than or less than 0.5.

Any code available for this, plz share?

Appreciate any help plz!!

PCC genes list • 7.3k views
6
Entering edit mode
5.6 years ago

In R Programming Language:

# 1, create random data of 300 genes (rows) x 20 samples (columns)

mat <- matrix(rexp(6000, rate=0.1), ncol=20)
rownames(mat) <- paste0("gene", 1:nrow(mat))

dim (mat)
[1] 300  20

mat[1:5,1:5]
[,1]      [,2]       [,3]     [,4]       [,5]
gene1  4.055854  5.427153 12.9471235 1.220788  8.0725705
gene2 10.805221  1.679573  2.0239176 3.754633 10.5629855
gene3 10.528569  3.867029 12.3135247 9.175522  4.5774353
gene4  6.893934  5.824185  0.3714950 4.295631  0.8289908
gene5  2.422062 14.197895  0.4122351 9.827562  0.8281283


# 2, create a gene correlation matrix

cormat <- cor(t(mat), method = "pearson")

dim(cormat)
[1] 300 300


# 3, how many correlations are > Pearson 0.5?

table(cormat>0.5)
FALSE  TRUE
87528  2472


# 4, summarise all correlations > 0.5

corlist <- apply(cormat, 2, function(x) names(which(x>0.5)))
corlist <- do.call(rbind, lapply(corlist, function(x) paste(x, collapse = ", ")))

[,1]
gene1 "gene1, gene107, gene144, gene225, gene255, gene272"
gene2 "gene2, gene159, gene280"
gene3 "gene3, gene110, gene113, gene165, gene178, gene244"
gene4 "gene4, gene14, gene35, gene41, gene54, gene61, gene73, gene91, gene107, gene161"
gene5 "gene5, gene84, gene103, gene126, gene153, gene154, gene163, gene247, gene267, gene292"
gene6 "gene6, gene68, gene81, gene123, gene166, gene189, gene297"

# a different way, retaining name-value pairs:
corlist <- apply(cormat, 2, function(x) x[x>0.5])
$gene1 gene1 gene27 gene82 gene86 gene124 gene134 gene238 gene298 1.0000000 0.6933233 0.6195321 0.5372766 0.5718707 0.6206825 0.6603721 0.5414058$gene2
gene2    gene38
1.0000000 0.6048402

$gene3 gene3 gene136 gene145 gene255 1.0000000 0.5972609 0.5366873 0.6380525$gene4
gene4    gene12   gene146   gene239   gene270
1.0000000 0.5542958 0.7369206 0.5382795 0.5211835

$gene5 gene5 gene85 gene97 gene102 gene181 gene182 gene202 gene203 1.0000000 0.8111300 0.6299341 0.6213753 0.5584764 0.5344177 0.5881075 0.5067301 gene229 0.5336602$gene6
gene6    gene34   gene102   gene152   gene177   gene181   gene190   gene202
1.0000000 0.5356967 0.5579895 0.6305093 0.5663306 0.6597329 0.5897418 0.6123978
gene229   gene241
0.5903611 0.5054317


Kevin

0
Entering edit mode

@Kevin Blighe

Hi, Kevin I already have the samples too, list of 300 genes and many samples Can I apply the same code? I want the output like edges list gene1 gene2 PCC value between them

1
Entering edit mode

Every correlation value that you require will be in the cormat object. You can manipulate that in order to get the data in whatever final format you require.

cormat[1:5,1:10]
gene1       gene2      gene3       gene4       gene5      gene6
gene1  1.00000000 -0.21283479 -0.1810563  0.26833246 -0.09807815  0.2732797
gene2 -0.21283479  1.00000000 -0.3612355  0.07721188  0.04814679 -0.2055238
gene3 -0.18105633 -0.36123549  1.0000000 -0.32038329 -0.39066175 -0.2079907
gene4  0.26833246  0.07721188 -0.3203833  1.00000000  0.02838447  0.2693119
gene5 -0.09807815  0.04814679 -0.3906617  0.02838447  1.00000000 -0.2314281
gene7       gene8       gene9      gene10
gene1  0.04491062  0.20639200  0.01804317 -0.09494333
gene2 -0.12443723  0.10425807 -0.02924639  0.33801704
gene3 -0.11999162 -0.29285677  0.05683996 -0.08382801
gene4  0.18194632 -0.02893122 -0.07697578 -0.22989306
gene5  0.01205250  0.33458971 -0.32346308 -0.29038923

0
Entering edit mode

I want to keep the PCC values larger than 0.5 or smaller than -0.5? Which command do i use in R?

5
Entering edit mode

Sorry, I will not be doing everything. One needs to learn how to do these simple tasks on their own. I have already given substantial information in my answer and comments. Take a look at my answer again, particularly, part 4. Good luck. [hint: to also get those < -0.5, use the abs() function]

0
Entering edit mode

you have made a parallel version to calculate large correlation matrix i think along with the clusgap function...can you post the link here? found it link you have removed the correaltion matrix..part

0
Entering edit mode

Yes, I was not 100% content with it. I believe there is another package / function called 'bigcor', which can do the same. A lot of the 'parallel' functionality that I developed was also 'absorbed' into my RegParallel Bioconductor package, while clusGapKB was absorbed into another future pipeline / package, cytofNet (https://github.com/kevinblighe/cytofNet)

0
Entering edit mode

you can subset isn't it based on your desired threshold !!