Question: plotting problem with loop in R
0
smyiz20 wrote:

I am trying to calculate the correlation coefficient and prepare distance matrix and draw a plot of clustering. I am stuck in plotting. Here is my script:

``````   coef <- function(x,y){
a <- sum(x * y) / sqrt(sum(x)^2 * sum(y)^2)
return(a)
}

distance <- function(X) {
n <- ncol(X)
n <- nrow(X)
distMat  <- matrix(0, nrow = n, ncol = n)

for ( i in 1:(n-1) ) {
for ( j in (i+1):n ) {
v1 <- X[ , i]
v2 <- X[ , j]
d <- 1 - coef(v1, v2)
distMat[i, j] <- d
distMat[j, i] <- d
}
}

for ( i in 1:n ) { distMat[i, i] <- 1 }

hr <- hclust(distMat, method="average")
plot <- plot(hr)

return(plot)
}
``````
plot R • 392 views
written 8 months ago by smyiz20
1

You need to use `print`

what is exactly the problem? are you getting an error? no plot is generated?

I call "dev.off()" and the output reads "null device". the plot is not generated :(

1

the function returns the plot, to have it save probably you need to

``````pdf("myPlots.pdf")
print(distance(args.....))
dev.off()
``````

Thanks for your reply but I am getting this error: Error in X[, i] : object of type 'closure' is not subsettable

1

the error refers to this line

``````v1 <- X[ , i]
``````

can you tell the outcome of the following:

``````ncol(X)
nrow(X)
n
``````

by the way, in R you can easly use function cor on a full matrix instead of looping, see Correlation matrix ...

ncol(X)  18 nrow(X)  3173 n  3173

"cor" function wants only these methods: "pearson", "kendall", "spearman", I wanna use uncentered-pearson

1

I see, then just use a second variable to get it running. The error is due to using the same variable to loop over the columns and rows leading to going beyond X's dimensions, i.e. by setting up the same value for both loops, X has only 18 columns yet n = 3173 = nrow(X).

Thanks, but I didn't quite get it. Do you mean I need to change v1 <- X[ , i] to v1 <- distMat[ ,i]

You should be able to compute that using matrix / vector operations without the loops. Let `M <- t(X) %*% X`, Let `z <- diag(M)`. Then your uncentred `cor`s should be `M / sqrt(z %*% t(z))`. !!Written but not tested!!

You could experiment or search: https://www.statmethods.net/advstats/matrix.html

I think, your formulas equal ( x * y) / sqrt(sum(x)^2 * sum(y)^2. In uncentered correlation, it should be started with sum(x*y)

1

I wrote the equation to work with all pairs of columns in one go. The [i, j] entry of `t(X) %*% X` is equal to `sum(x[, i] * x[, j])`

Could you split up your distance computation from your plotting step, so that you can debug why R thinks your `X` is a function/closure?