Question: Compute correlations between OTUs
0
vincentpailler100 wrote:

Hi guys,

I've got this kind of matrix : https://ibb.co/r47yY3m

I would like to compute correlations between each pair of OTUs according to this formula (rho coefficient):

(for example between the first and second OTU)

``````1-((var(lol[1,]-lol[2,]))/(var(lol[1,])+var(lol[2,])))
``````

How could I create a function to compute automatically each correlation between each pair according to get a result like this :

`````` OTU0001 OTU0004 X
OTU0001 OTU0014 X
....
OTU0016 OTU0017 X
``````

Thanks

correlation • 299 views
written 6 months ago by vincentpailler100
0
Kevin Blighe48k wrote:

Can you please follow up on what seems like a previous related question: Parallelization to compute correlations

If you look at my previous answer, you should be able to infer how you can apply this new equation and parallelise the entire process

ADD COMMENTlink modified 6 months ago • written 6 months ago by Kevin Blighe48k

Actually, the problem here is to create the function to compute the correlations between each pairs of OTUs. I do not want to use the function cor() but create the function like I did between the first and the second OTU. I just want to "automatize" this function for each pair

All that you have to do is replace the `cor()` function (in my previous example) with your new function.

I know what you mean, but my main problem is to create this new function.. I checked on Biostars but didn't find anything about my problem..

Is this not your formula? - `1-((var(lol[1,]-lol[2,]))/(var(lol[1,])+var(lol[2,])))`

Implementing it in the `foreach` loop would be something like:

``````1-((var(lol[i,]-lol[i+1,]))/(var(lol[i,])+var(lol[i+1,])))
``````

I found this formula in a paper "propr: An R-package for Identifying Proportionally Abundant Features Using Compositional Data Analysis."

I tried :

``````cores<-32
options('mc.cores'=cores)
registerDoParallel(cores)

data<-read.table("/home/vipailler/PROJET_M2/raw/truelength2.prok2.uniref2.rares.tsv", h=T, row.names=1, sep="\t")

res <- foreach(i = seq_len(ncol(data)),
.combine = rbind,
.multicombine = TRUE,
.inorder = FALSE,
.packages = c('data.table', 'doParallel')) %dopar% {
1-((var(data[,i]-data[,i+1]))/(var(data[,i])+var(data[,i+1])))   #the correlation between each pair is done on colums, so I put my OTUs to column
}
``````

It doesn't give me what I want. Actually, I don't really know how I could get a result like this :

`````` OTU0001 OTU0004 X
OTU0001 OTU0014 X
....
OTU0016 OTU0017 X
``````

Try it on a small subset of your data, first, and do it manually. Then you can tackle the issue of doing it over the entire data.

I created a small subset with 5 rows and 5 columns.

When I try :

``````for (i in seq_len(ncol(data)))
{
1-((var(data[,i]-data[,i+1]))/(var(data[,i])+var(data[,i+1])))
}
``````

It returns me an error. The problem is really to create the function which computes "my" formula to compute all correlations. It's a bit tricky for me to create this function actually...

1

This seems to do it(?). Here, there is only one layer of parallelisation. If you want a double layer, you can additionally parallelise where I use the `apply()` function within `foreach`

``````cores <- 4
options('mc.cores'=cores)
registerDoParallel(cores)

a <- matrix(rexp(200, rate=.1), ncol=20)
colnames(a) <- paste0("Sample", 1:ncol(a))
rownames(a) <- paste0("Gene", 1:nrow(a))

res <- foreach(i = seq_len(ncol(a)),
.combine = rbind,
.multicombine = TRUE,
.inorder = FALSE,
.packages = c('data.table', 'doParallel')) %dopar% {
apply(a, 2, function(x) 1 - ((var(a[,i] - x)) / (var(a[,i]) + var(x))))
}
Sample1     Sample2     Sample3     Sample4      Sample5     Sample6     Sample7     Sample8     Sample9    Sample10    Sample11     Sample12    Sample13    Sample14    Sample15    Sample16    Sample17    Sample18
result.1   1.00000000 -0.28479981 -0.23777373  0.09938681 -0.303090446 -0.09815046  0.42514519 -0.41238625  0.15941159 -0.39453720  0.60702244 -0.051540197  0.21332970  0.25701031  0.20258919  0.49758148 -0.09723184 -0.24925385
result.2  -0.28479981  1.00000000 -0.19874414  0.02163992  0.021271304  0.34942860  0.06344279  0.27414492 -0.28064478  0.85363248  0.06748158  0.157551491 -0.35613149  0.04479769  0.11072481  0.01456317 -0.02471427  0.01738776
result.3  -0.23777373 -0.19874414  1.00000000 -0.58066203 -0.142873581  0.33025197 -0.19001553  0.17620598  0.38414154  0.13801192 -0.01780513  0.371760584 -0.44999519 -0.28071486 -0.16220850  0.18816097 -0.25087369  0.46963241
result.4   0.09938681  0.02163992 -0.58066203  1.00000000 -0.128729136 -0.28798421  0.32587409 -0.01427686  0.09101038 -0.24727311 -0.04498139 -0.781364637  0.74346227 -0.08003990  0.04414206 -0.28322733  0.43946012 -0.38914141
result.5  -0.30309045  0.02127130 -0.14287358 -0.12872914  1.000000000 -0.04647326 -0.33450188  0.01675502 -0.42193039 -0.15393426 -0.31934328  0.005608499  0.08793800  0.14688155  0.12755842 -0.10290322 -0.31699414 -0.06115937
result.6  -0.09815046  0.34942860  0.33025197 -0.28798421 -0.046473256  1.00000000 -0.19952768 -0.20984385 -0.28747544  0.47702735  0.28104694  0.398865246 -0.30355400 -0.16523184 -0.24561294  0.41511985 -0.40267328  0.55102909
result.7   0.42514519  0.06344279 -0.19001553  0.32587409 -0.334501876 -0.19952768  1.00000000 -0.13532401  0.56245252 -0.08844725  0.70863149 -0.074939520  0.18557674  0.18099969  0.55958779  0.16789110 -0.01556326 -0.09912881
result.8  -0.41238625  0.27414492  0.17620598 -0.01427686  0.016755018 -0.20984385 -0.13532401  1.00000000  0.08404417  0.29517833 -0.45589833 -0.209246121 -0.14346892  0.16506456  0.18942269 -0.71413017  0.26521472 -0.06655534
result.9   0.15941159 -0.28064478  0.38414154  0.09101038 -0.421930393 -0.28747544  0.56245252  0.08404417  1.00000000 -0.17316575  0.29817555 -0.060663994  0.10958363 -0.10801415  0.12317480  0.01593299  0.20151833 -0.09511908
result.10 -0.39453720  0.85363248  0.13801192 -0.24727311 -0.153934257  0.47702735 -0.08844725  0.29517833 -0.17316575  1.00000000  0.06306882  0.415786965 -0.67254801 -0.07812260 -0.02626869  0.08027732 -0.02256247  0.25777067
result.11  0.60702244  0.06748158 -0.01780513 -0.04498139 -0.319343282  0.28104694  0.70863149 -0.45589833  0.29817555  0.06306882  1.00000000  0.259269823 -0.10230501  0.15559544  0.36576575  0.56402755 -0.27040248  0.08797755
...
Sample19    Sample20
result.1  -0.40852033 -0.13080431
result.2  -0.52206943 -0.08709850
result.3   0.09042043  0.27226199
result.4   0.12715771 -0.45587264
result.5   0.30969572  0.02534422
result.6  -0.38017378  0.01640989
result.7  -0.10465839 -0.27980545
result.8  -0.22714070  0.43995193
result.9   0.16107426 -0.22359325
result.10 -0.53504237  0.09926318
result.11 -0.30548596 -0.19614219
...
``````
ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe48k

Thanks a lot for this answer. Do you think it is possible to only get the lower or the upper triangle of the matrix? In order to reduce the time of the script

ADD REPLYlink modified 6 months ago • written 6 months ago by vincentpailler100