Question

Calculate The Weighted Mean Of Matrix In R: Fastest Way

1

Entering edit mode

14.7 years ago

Sirus ▴ 820

Hello everybody, I want to create a simple R script that calculates the center of a cluster, I X={X1,X2,...,X2} is my data matrix and U={U_ik} is the partnership of element Xi in cluster K, I want to do the following in a fastest way:

$alt text$

so here each Xi is multiplied by a weight and summed up and then divided by the sum of weights , so it is weighted mean, but here our X is a vector. There in R the function weighted.mean(), but it needs only numerical values. I have written a script well it works for one but for a big number of X it is slow the code is

U : membership matrix.
X : Our data.
m : parameter
K : cluster number K

function(U,X,m,k)
{
  Nominator <- matrix(0,length(X[,1]),1)
  Denominator <-0

  for(i in 1:length(U[1,])) #We go trought the elements of the cluster k
  {
    if(U[k,i]!=0)
    {
      Nominator <- Nominator+ (U[k,i]^m)* X[,i]
      Denominator <- Denominator + (U[k,i]^m)
    }
  }
  Nominator/Denominator
}

r matrix statistics • 11k views

ADD COMMENT • link updated 11.7 years ago by Biostar 20 • written 14.7 years ago by Sirus ▴ 820

0

Entering edit mode

btw, if all your U matrix elements were only 0 or 1, why do you take to the power of m then?

ADD REPLY • link 14.7 years ago by Michael 55k

0

Entering edit mode

You are doing it wrong: just figured that your code doesn't do what your formula says. you have to take X^m not U^m !

ADD REPLY • link 14.7 years ago by Michael 55k

0

Entering edit mode

Here X values may be 0 or 1 but U values are real values. In fact they represent the membership degree of element Xj in cluster Ci

ADD REPLY • link 14.7 years ago by Sirus ▴ 820

Ram · Answer 1 · 2010-11-04

5

Entering edit mode

14.7 years ago

Michael 55k

How about this (it's a 1-liner):

foo2 <- function(U,X,m,k)  X %*% U[k,]^m / sum(U[k,]^m)

plus make some sample data:

U <- matrix(sample(c(0,1), size=10*10, replace=TRUE), nrow=10, ncol=10)
X <- matrix(rnorm(100), nrow=10, ncol=10)

let's try it out:

foo1 <- your function
>all (foo2(U,X,2,1) == foo1(U,X,2,1))
[1] TRUE

It will be about x-times faster for large data

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 14.7 years ago by Michael 55k

0

Entering edit mode

Nice work converting C into R. I didn't know about all(), that's a sweet shortcut for confirming your solution.

ADD REPLY • link 14.7 years ago by David Quigley 11k

0

Entering edit mode

Waw, it seems fast, I will try it now , thank you for you precious help. my data is big

ADD REPLY • link 14.7 years ago by Sirus ▴ 820

0

Entering edit mode

I have tried your methode with this sample if I do all (foo2(U,X,2,1) == foo1(U,X,2,1)) it is TRUE but all (foo2(X,U,2,1) == foo1(X,U,2,1)) it is is FALSE I think it needs only a small change, I will try to figure it out. And really thank you for this help, I am happy to see the equivalent code in one line.

ADD REPLY • link 14.7 years ago by Sirus ▴ 820

0

Entering edit mode

Of course the result will be different if you exchange the parameters ;) The vector to matrix multiplication %% is sensitive to the order of arguments such that : x %% A = A %% t(x) (with x a row vector and t() transposition function, if I my remember linear algebra classes rule of thumb row times column* ;)

ADD REPLY • link 14.7 years ago by Michael 55k

0

Entering edit mode

Yeah, that's true but I mean that normally the results of foo2 and foo1 must the same what ever is the order of the parameters. foo2(X,U,m,k) should be equal to foo1(X,U,m,k) because normally we are applying the same formula so we should have the same results.

ADD REPLY • link 14.7 years ago by Sirus ▴ 820

0

Entering edit mode

no, because R is not strongly typed. for example you can matrix multiply a vector x with a matrix M: x %% M if you exchange order M %% x then the result is different. So I can define a totally valid function given x is a vector and M a matrix, if that is inverted the result can be different

ADD REPLY • link 14.5 years ago by Michael 55k

Ram · Answer 2 · 2010-11-05

0

Entering edit mode

14.7 years ago

Sirus ▴ 820

Thank you very much Michael Dondrup,

I have figured out the solution, inspired from your code bellow it is

foo1<-function(U,X,m,k)(t(t(X) * U[k,]^m) %*% c(rep(1,length(U[k,]))))/sum(U[k,]^m)

For the time of execution, the first one that I have wrote is fast in case the U matrix has a lot of 0's.

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 14.7 years ago by Sirus ▴ 820

0

Entering edit mode

So you changed the formula you wish to compute? My impression was that your function as well as foo2 are consistent with the formula you show for center_k

ADD REPLY • link 14.7 years ago by Michael 55k