Question

Compute correlations between OTUs

0

Entering edit mode

5.1 years ago

pablo ▴ 300

Hi guys,

I've got this kind of matrix : https://ibb.co/r47yY3m

I would like to compute correlations between each pair of OTUs according to this formula (rho coefficient):

https://ibb.co/ky4hkx3

(for example between the first and second OTU)

1-((var(lol[1,]-lol[2,]))/(var(lol[1,])+var(lol[2,])))

How could I create a function to compute automatically each correlation between each pair according to get a result like this :

 OTU0001 OTU0004 X
 OTU0001 OTU0014 X
 ....
 OTU0016 OTU0017 X

Thanks

correlation • 1.4k views

ADD COMMENT • link 5.1 years ago by pablo ▴ 300

score 0 · Answer 1 · 2019-03-14

0

Entering edit mode

5.1 years ago

Kevin Blighe 87k

Can you please follow up on what seems like a previous related question: Parallelization to compute correlations

If you look at my previous answer, you should be able to infer how you can apply this new equation and parallelise the entire process

ADD COMMENT • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Actually, the problem here is to create the function to compute the correlations between each pairs of OTUs. I do not want to use the function cor() but create the function like I did between the first and the second OTU. I just want to "automatize" this function for each pair

ADD REPLY • link 5.1 years ago by pablo ▴ 300

0

Entering edit mode

All that you have to do is replace the cor() function (in my previous example) with your new function.

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

I know what you mean, but my main problem is to create this new function.. I checked on Biostars but didn't find anything about my problem..

ADD REPLY • link 5.1 years ago by pablo ▴ 300

0

Entering edit mode

Is this not your formula? - 1-((var(lol[1,]-lol[2,]))/(var(lol[1,])+var(lol[2,])))

Implementing it in the foreach loop would be something like:

1-((var(lol[i,]-lol[i+1,]))/(var(lol[i,])+var(lol[i+1,])))

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

I found this formula in a paper "propr: An R-package for Identifying Proportionally Abundant Features Using Compositional Data Analysis."

I tried :

cores<-32
options('mc.cores'=cores)
registerDoParallel(cores)

data<-read.table("/home/vipailler/PROJET_M2/raw/truelength2.prok2.uniref2.rares.tsv", h=T, row.names=1, sep="\t")

res <- foreach(i = seq_len(ncol(data)),
  .combine = rbind,
  .multicombine = TRUE,
  .inorder = FALSE,
  .packages = c('data.table', 'doParallel')) %dopar% {
 1-((var(data[,i]-data[,i+1]))/(var(data[,i])+var(data[,i+1])))   #the correlation between each pair is done on colums, so I put my OTUs to column
}

It doesn't give me what I want. Actually, I don't really know how I could get a result like this :

 OTU0001 OTU0004 X
 OTU0001 OTU0014 X
 ....
 OTU0016 OTU0017 X

ADD REPLY • link 5.1 years ago by pablo ▴ 300

0

Entering edit mode

Try it on a small subset of your data, first, and do it manually. Then you can tackle the issue of doing it over the entire data.

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

I created a small subset with 5 rows and 5 columns.

When I try :

for (i in seq_len(ncol(data)))
{
1-((var(data[,i]-data[,i+1]))/(var(data[,i])+var(data[,i+1])))
}

It returns me an error. The problem is really to create the function which computes "my" formula to compute all correlations. It's a bit tricky for me to create this function actually...

ADD REPLY • link 5.1 years ago by pablo ▴ 300

1

Entering edit mode

This seems to do it(?). Here, there is only one layer of parallelisation. If you want a double layer, you can additionally parallelise where I use the apply() function within foreach

cores <- 4
options('mc.cores'=cores)
registerDoParallel(cores)

a <- matrix(rexp(200, rate=.1), ncol=20)
colnames(a) <- paste0("Sample", 1:ncol(a))
rownames(a) <- paste0("Gene", 1:nrow(a))

res <- foreach(i = seq_len(ncol(a)),
  .combine = rbind,
  .multicombine = TRUE,
  .inorder = FALSE,
  .packages = c('data.table', 'doParallel')) %dopar% {
  apply(a, 2, function(x) 1 - ((var(a[,i] - x)) / (var(a[,i]) + var(x))))
}
              Sample1     Sample2     Sample3     Sample4      Sample5     Sample6     Sample7     Sample8     Sample9    Sample10    Sample11     Sample12    Sample13    Sample14    Sample15    Sample16    Sample17    Sample18
result.1   1.00000000 -0.28479981 -0.23777373  0.09938681 -0.303090446 -0.09815046  0.42514519 -0.41238625  0.15941159 -0.39453720  0.60702244 -0.051540197  0.21332970  0.25701031  0.20258919  0.49758148 -0.09723184 -0.24925385
result.2  -0.28479981  1.00000000 -0.19874414  0.02163992  0.021271304  0.34942860  0.06344279  0.27414492 -0.28064478  0.85363248  0.06748158  0.157551491 -0.35613149  0.04479769  0.11072481  0.01456317 -0.02471427  0.01738776
result.3  -0.23777373 -0.19874414  1.00000000 -0.58066203 -0.142873581  0.33025197 -0.19001553  0.17620598  0.38414154  0.13801192 -0.01780513  0.371760584 -0.44999519 -0.28071486 -0.16220850  0.18816097 -0.25087369  0.46963241
result.4   0.09938681  0.02163992 -0.58066203  1.00000000 -0.128729136 -0.28798421  0.32587409 -0.01427686  0.09101038 -0.24727311 -0.04498139 -0.781364637  0.74346227 -0.08003990  0.04414206 -0.28322733  0.43946012 -0.38914141
result.5  -0.30309045  0.02127130 -0.14287358 -0.12872914  1.000000000 -0.04647326 -0.33450188  0.01675502 -0.42193039 -0.15393426 -0.31934328  0.005608499  0.08793800  0.14688155  0.12755842 -0.10290322 -0.31699414 -0.06115937
result.6  -0.09815046  0.34942860  0.33025197 -0.28798421 -0.046473256  1.00000000 -0.19952768 -0.20984385 -0.28747544  0.47702735  0.28104694  0.398865246 -0.30355400 -0.16523184 -0.24561294  0.41511985 -0.40267328  0.55102909
result.7   0.42514519  0.06344279 -0.19001553  0.32587409 -0.334501876 -0.19952768  1.00000000 -0.13532401  0.56245252 -0.08844725  0.70863149 -0.074939520  0.18557674  0.18099969  0.55958779  0.16789110 -0.01556326 -0.09912881
result.8  -0.41238625  0.27414492  0.17620598 -0.01427686  0.016755018 -0.20984385 -0.13532401  1.00000000  0.08404417  0.29517833 -0.45589833 -0.209246121 -0.14346892  0.16506456  0.18942269 -0.71413017  0.26521472 -0.06655534
result.9   0.15941159 -0.28064478  0.38414154  0.09101038 -0.421930393 -0.28747544  0.56245252  0.08404417  1.00000000 -0.17316575  0.29817555 -0.060663994  0.10958363 -0.10801415  0.12317480  0.01593299  0.20151833 -0.09511908
result.10 -0.39453720  0.85363248  0.13801192 -0.24727311 -0.153934257  0.47702735 -0.08844725  0.29517833 -0.17316575  1.00000000  0.06306882  0.415786965 -0.67254801 -0.07812260 -0.02626869  0.08027732 -0.02256247  0.25777067
result.11  0.60702244  0.06748158 -0.01780513 -0.04498139 -0.319343282  0.28104694  0.70863149 -0.45589833  0.29817555  0.06306882  1.00000000  0.259269823 -0.10230501  0.15559544  0.36576575  0.56402755 -0.27040248  0.08797755
...
             Sample19    Sample20
result.1  -0.40852033 -0.13080431
result.2  -0.52206943 -0.08709850
result.3   0.09042043  0.27226199
result.4   0.12715771 -0.45587264
result.5   0.30969572  0.02534422
result.6  -0.38017378  0.01640989
result.7  -0.10465839 -0.27980545
result.8  -0.22714070  0.43995193
result.9   0.16107426 -0.22359325
result.10 -0.53504237  0.09926318
result.11 -0.30548596 -0.19614219
...

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks a lot for this answer. Do you think it is possible to only get the lower or the upper triangle of the matrix? In order to reduce the time of the script

ADD REPLY • link 5.1 years ago by pablo ▴ 300