Question: Calculate The Weighted Mean Of Matrix In R: Fastest Way
1
gravatar for Sirus
7.6 years ago by
Sirus770
Boston/USA
Sirus770 wrote:

Hello everybody, I want to create a simple R script that calculates the center of a cluster, I X={X1,X2,...,X2} is my data matrix and U={U_ik} is the partnership of element Xi in cluster K, I want to do the following in a fastest way:

alt text

so here each Xi is multiplied by a weight and summed up and then divided by the sum of weights , so it is weighted mean, but here our X is a vector. There in R the function weighted.mean(), but it needs only numerical values. I have written a script well it works for one but for a big number of X it is slow the code is

U : membership matrix.
X : Our data.
m : parameter
K : cluster number K

function(U,X,m,k)
{
  Nominator <- matrix(0,length(X[,1]),1)
  Denominator <-0

  for(i in 1:length(U[1,])) #We go trought the elements of the cluster k
  {
    if(U[k,i]!=0)
    {
      Nominator <- Nominator+ (U[k,i]^m)* X[,i]
      Denominator <- Denominator + (U[k,i]^m)
    }
  }
  Nominator/Denominator
}
R matrix statistics • 5.6k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 7.6 years ago by Sirus770

btw, if all your U matrix elements were only 0 or 1, why do you take to the power of m then?

ADD REPLYlink written 7.6 years ago by Michael Dondrup44k

You are doing it wrong: just figured that your code doesn't do what your formula says. you have to take X^m not U^m !

ADD REPLYlink written 7.6 years ago by Michael Dondrup44k

Here X values may be 0 or 1 but U values are real values. In fact they represent the membership degree of element Xj in cluster Ci

ADD REPLYlink written 7.6 years ago by Sirus770
5
gravatar for Michael Dondrup
7.6 years ago by
Bergen, Norway
Michael Dondrup44k wrote:

How about this (it's a 1-liner):

foo2 <- function(U,X,m,k)  X %*% U[k,]^m / sum(U[k,]^m)

plus make some sample data:

U <- matrix(sample(c(0,1), size=10*10, replace=TRUE), nrow=10, ncol=10)
X <- matrix(rnorm(100), nrow=10, ncol=10)

let's try it out:

foo1 <- your function
>all (foo2(U,X,2,1) == foo1(U,X,2,1))
[1] TRUE

It will be about x-times faster for large data

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Michael Dondrup44k

Nice work converting C into R. I didn't know about all(), that's a sweet shortcut for confirming your solution.

ADD REPLYlink written 7.6 years ago by David Quigley11k

Waw, it seems fast, I will try it now , thank you for you precious help. my data is big

ADD REPLYlink written 7.6 years ago by Sirus770

I have tried your methode with this sample if I do all (foo2(U,X,2,1) == foo1(U,X,2,1)) it is TRUE but all (foo2(X,U,2,1) == foo1(X,U,2,1)) it is is FALSE I think it needs only a small change, I will try to figure it out. And really thank you for this help, I am happy to see the equivalent code in one line.

ADD REPLYlink written 7.6 years ago by Sirus770

Of course the result will be different if you exchange the parameters ;) The vector to matrix multiplication %% is sensitive to the order of arguments such that : x %% A = A %% t(x) (with x a row vector and t() transposition function, if I my remember linear algebra classes rule of thumb row times column* ;)

ADD REPLYlink written 7.6 years ago by Michael Dondrup44k

Yeah, that's true but I mean that normally the results of foo2 and foo1 must the same what ever is the order of the parameters. foo2(X,U,m,k) should be equal to foo1(X,U,m,k) because normally we are applying the same formula so we should have the same results.

ADD REPLYlink written 7.6 years ago by Sirus770

no, because R is not strongly typed. for example you can matrix multiply a vector x with a matrix M: x %% M if you exchange order M %% x then the result is different. So I can define a totally valid function given x is a vector and M a matrix, if that is inverted the result can be different

ADD REPLYlink written 7.4 years ago by Michael Dondrup44k
0
gravatar for Sirus
7.6 years ago by
Sirus770
Boston/USA
Sirus770 wrote:

Thank you very much Michael Dondrup, I have figured out the solution, inspired from your code bellow it is

foo1<-function(U,X,m,k)(t(t(X) * U[k,]^m) %*% c(rep(1,length(U[k,]))))/sum(U[k,]^m)

For the time of execution, the first one that I have wrote is fast in case the U matrix has a lot of 0's.

ADD COMMENTlink written 7.6 years ago by Sirus770

So you changed the formula you wish to compute? My impression was that your function as well as foo2 are consistent with the formula you show for center_k

ADD REPLYlink written 7.6 years ago by Michael Dondrup44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1635 users visited in the last hour