Question: How to calculate the Pearson correlation coefficients between a list of genes?
1
2.1 years ago by
Chaimaa230
Chaimaa230 wrote:

Hello guys,

I have a list of around 300 genes and, I want to calculate gene co-expression value via the Pearson correlation coefficient (PCC) between these genes and return only the ones have larger than or less than 0.5.

Any code available for this, plz share?

Appreciate any help plz!!

genes list pcc • 2.7k views
modified 2.1 years ago by Kevin Blighe69k • written 2.1 years ago by Chaimaa230
4
2.1 years ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

In R Programming Language:

# 1, create random data of 300 genes (rows) x 20 samples (columns)

``````mat <- matrix(rexp(6000, rate=0.1), ncol=20)
rownames(mat) <- paste0("gene", 1:nrow(mat))

dim (mat)
[1] 300  20

mat[1:5,1:5]
[,1]      [,2]       [,3]     [,4]       [,5]
gene1  4.055854  5.427153 12.9471235 1.220788  8.0725705
gene2 10.805221  1.679573  2.0239176 3.754633 10.5629855
gene3 10.528569  3.867029 12.3135247 9.175522  4.5774353
gene4  6.893934  5.824185  0.3714950 4.295631  0.8289908
gene5  2.422062 14.197895  0.4122351 9.827562  0.8281283
``````

# 2, create a gene correlation matrix

``````cormat <- cor(t(mat), method = "pearson")

dim(cormat)
[1] 300 300
``````

# 3, how many correlations are > Pearson 0.5?

``````table(cormat>0.5)
FALSE  TRUE
87528  2472
``````

# 4, summarise all correlations > 0.5

``````corlist <- apply(cormat, 2, function(x) names(which(x>0.5)))
corlist <- do.call(rbind, lapply(corlist, function(x) paste(x, collapse = ", ")))

[,1]
gene1 "gene1, gene107, gene144, gene225, gene255, gene272"
gene2 "gene2, gene159, gene280"
gene3 "gene3, gene110, gene113, gene165, gene178, gene244"
gene4 "gene4, gene14, gene35, gene41, gene54, gene61, gene73, gene91, gene107, gene161"
gene5 "gene5, gene84, gene103, gene126, gene153, gene154, gene163, gene247, gene267, gene292"
gene6 "gene6, gene68, gene81, gene123, gene166, gene189, gene297"

# a different way, retaining name-value pairs:
corlist <- apply(cormat, 2, function(x) x[x>0.5])
\$gene1
gene1    gene27    gene82    gene86   gene124   gene134   gene238   gene298
1.0000000 0.6933233 0.6195321 0.5372766 0.5718707 0.6206825 0.6603721 0.5414058

\$gene2
gene2    gene38
1.0000000 0.6048402

\$gene3
gene3   gene136   gene145   gene255
1.0000000 0.5972609 0.5366873 0.6380525

\$gene4
gene4    gene12   gene146   gene239   gene270
1.0000000 0.5542958 0.7369206 0.5382795 0.5211835

\$gene5
gene5    gene85    gene97   gene102   gene181   gene182   gene202   gene203
1.0000000 0.8111300 0.6299341 0.6213753 0.5584764 0.5344177 0.5881075 0.5067301
gene229
0.5336602

\$gene6
gene6    gene34   gene102   gene152   gene177   gene181   gene190   gene202
1.0000000 0.5356967 0.5579895 0.6305093 0.5663306 0.6597329 0.5897418 0.6123978
gene229   gene241
0.5903611 0.5054317
``````

Kevin

@Kevin Blighe

Hi, Kevin I already have the samples too, list of 300 genes and many samples Can I apply the same code? I want the output like edges list `gene1 gene2 PCC value between them`

1

Every correlation value that you require will be in the `cormat` object. You can manipulate that in order to get the data in whatever final format you require.

``````cormat[1:5,1:10]
gene1       gene2      gene3       gene4       gene5      gene6
gene1  1.00000000 -0.21283479 -0.1810563  0.26833246 -0.09807815  0.2732797
gene2 -0.21283479  1.00000000 -0.3612355  0.07721188  0.04814679 -0.2055238
gene3 -0.18105633 -0.36123549  1.0000000 -0.32038329 -0.39066175 -0.2079907
gene4  0.26833246  0.07721188 -0.3203833  1.00000000  0.02838447  0.2693119
gene5 -0.09807815  0.04814679 -0.3906617  0.02838447  1.00000000 -0.2314281
gene7       gene8       gene9      gene10
gene1  0.04491062  0.20639200  0.01804317 -0.09494333
gene2 -0.12443723  0.10425807 -0.02924639  0.33801704
gene3 -0.11999162 -0.29285677  0.05683996 -0.08382801
gene4  0.18194632 -0.02893122 -0.07697578 -0.22989306
gene5  0.01205250  0.33458971 -0.32346308 -0.29038923
``````

I want to keep the PCC values larger than 0.5 or smaller than -0.5? Which command do i use in R?

4

Sorry, I will not be doing everything for you. You need to learn how to do these simple tasks on your own. I have already given you substantial information in my answer and comments. If you want to be a good bioinformatician, you need to learn how to re-use and build on existing code. Take a look at my answer again, particularly, part 4. Good luck. [hint: to also get those < -0.5, use the `abs()` function]

you have made a parallel version to calculate large correlation matrix i think along with the clusgap function...can you post the link here? found it link you have removed the correaltion matrix..part

Yes, I was not 100% content with it. I believe there is another package / function called 'bigcor', which can do the same. A lot of the 'parallel' functionality that I developed was also 'absorbed' into my RegParallel Bioconductor package, while clusGapKB was absorbed into another future pipeline / package, cytofNet (https://github.com/kevinblighe/cytofNet)

you can subset isn't it based on your desired threshold !!