Question: How to calculate the Pearson correlation coefficients between a list of genes?
1
gravatar for Chaimaa
17 months ago by
Chaimaa180
Chaimaa180 wrote:

Hello guys,

I have a list of around 300 genes and, I want to calculate gene co-expression value via the Pearson correlation coefficient (PCC) between these genes and return only the ones have larger than or less than 0.5.

Any code available for this, plz share?

Appreciate any help plz!!

genes list pcc • 1.8k views
ADD COMMENTlink modified 17 months ago by Kevin Blighe59k • written 17 months ago by Chaimaa180
4
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe59k
Kevin Blighe59k wrote:

In R Programming Language:

1, create random data of 300 genes (rows) x 20 samples (columns)

mat <- matrix(rexp(6000, rate=0.1), ncol=20)
rownames(mat) <- paste0("gene", 1:nrow(mat))

dim (mat)
[1] 300  20

mat[1:5,1:5]
           [,1]      [,2]       [,3]     [,4]       [,5]
gene1  4.055854  5.427153 12.9471235 1.220788  8.0725705
gene2 10.805221  1.679573  2.0239176 3.754633 10.5629855
gene3 10.528569  3.867029 12.3135247 9.175522  4.5774353
gene4  6.893934  5.824185  0.3714950 4.295631  0.8289908
gene5  2.422062 14.197895  0.4122351 9.827562  0.8281283

2, create a gene correlation matrix

cormat <- cor(t(mat), method = "pearson")

dim(cormat)
[1] 300 300

3, how many correlations are > Pearson 0.5?

table(cormat>0.5)
FALSE  TRUE 
87528  2472

4, summarise all correlations > 0.5

corlist <- apply(cormat, 2, function(x) names(which(x>0.5)))
corlist <- do.call(rbind, lapply(corlist, function(x) paste(x, collapse = ", ")))

head(corlist)
  [,1]                                                                                   
gene1 "gene1, gene107, gene144, gene225, gene255, gene272"                                   
gene2 "gene2, gene159, gene280"                                                              
gene3 "gene3, gene110, gene113, gene165, gene178, gene244"                                   
gene4 "gene4, gene14, gene35, gene41, gene54, gene61, gene73, gene91, gene107, gene161"      
gene5 "gene5, gene84, gene103, gene126, gene153, gene154, gene163, gene247, gene267, gene292"
gene6 "gene6, gene68, gene81, gene123, gene166, gene189, gene297"                            

# a different way, retaining name-value pairs:
corlist <- apply(cormat, 2, function(x) x[x>0.5])
head(corlist)
$gene1
    gene1    gene27    gene82    gene86   gene124   gene134   gene238   gene298 
1.0000000 0.6933233 0.6195321 0.5372766 0.5718707 0.6206825 0.6603721 0.5414058 

$gene2
    gene2    gene38 
1.0000000 0.6048402 

$gene3
    gene3   gene136   gene145   gene255 
1.0000000 0.5972609 0.5366873 0.6380525 

$gene4
    gene4    gene12   gene146   gene239   gene270 
1.0000000 0.5542958 0.7369206 0.5382795 0.5211835 

$gene5
    gene5    gene85    gene97   gene102   gene181   gene182   gene202   gene203 
1.0000000 0.8111300 0.6299341 0.6213753 0.5584764 0.5344177 0.5881075 0.5067301 
  gene229 
0.5336602 

$gene6
    gene6    gene34   gene102   gene152   gene177   gene181   gene190   gene202 
1.0000000 0.5356967 0.5579895 0.6305093 0.5663306 0.6597329 0.5897418 0.6123978 
  gene229   gene241 
0.5903611 0.5054317

Kevin

ADD COMMENTlink modified 11 months ago • written 17 months ago by Kevin Blighe59k

@Kevin Blighe

Hi, Kevin I already have the samples too, list of 300 genes and many samples Can I apply the same code? I want the output like edges list gene1 gene2 PCC value between them

ADD REPLYlink modified 17 months ago • written 17 months ago by Chaimaa180
1

Every correlation value that you require will be in the cormat object. You can manipulate that in order to get the data in whatever final format you require.

cormat[1:5,1:10]
            gene1       gene2      gene3       gene4       gene5      gene6
gene1  1.00000000 -0.21283479 -0.1810563  0.26833246 -0.09807815  0.2732797
gene2 -0.21283479  1.00000000 -0.3612355  0.07721188  0.04814679 -0.2055238
gene3 -0.18105633 -0.36123549  1.0000000 -0.32038329 -0.39066175 -0.2079907
gene4  0.26833246  0.07721188 -0.3203833  1.00000000  0.02838447  0.2693119
gene5 -0.09807815  0.04814679 -0.3906617  0.02838447  1.00000000 -0.2314281
            gene7       gene8       gene9      gene10
gene1  0.04491062  0.20639200  0.01804317 -0.09494333
gene2 -0.12443723  0.10425807 -0.02924639  0.33801704
gene3 -0.11999162 -0.29285677  0.05683996 -0.08382801
gene4  0.18194632 -0.02893122 -0.07697578 -0.22989306
gene5  0.01205250  0.33458971 -0.32346308 -0.29038923
ADD REPLYlink modified 17 months ago • written 17 months ago by Kevin Blighe59k

I want to keep the PCC values larger than 0.5 or smaller than -0.5? Which command do i use in R?

ADD REPLYlink written 17 months ago by Chaimaa180
4

Sorry, I will not be doing everything for you. You need to learn how to do these simple tasks on your own. I have already given you substantial information in my answer and comments. If you want to be a good bioinformatician, you need to learn how to re-use and build on existing code. Take a look at my answer again, particularly, part 4. Good luck. [hint: to also get those < -0.5, use the abs() function]

ADD REPLYlink modified 17 months ago • written 17 months ago by Kevin Blighe59k

you have made a parallel version to calculate large correlation matrix i think along with the clusgap function...can you post the link here? found it link you have removed the correaltion matrix..part

ADD REPLYlink modified 15 months ago • written 15 months ago by krushnach80740

Yes, I was not 100% content with it. I believe there is another package / function called 'bigcor', which can do the same. A lot of the 'parallel' functionality that I developed was also 'absorbed' into my RegParallel Bioconductor package, while clusGapKB was absorbed into another future pipeline / package, cytofNet (https://github.com/kevinblighe/cytofNet)

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe59k

you can subset isn't it based on your desired threshold !!

ADD REPLYlink written 17 months ago by krushnach80740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1666 users visited in the last hour