Entering edit mode
7.9 years ago
FrancesJones
•
0
Hi, I have a binary matrix which I would like to create a dendrogram with bootstrapping values. Each row of my matrix is an dependent variable and each column is an independent variable of which I have 49 and the first column is the names of the dependent variables so I've excluded this from the clustering so NAs aren't introduced.
I've used pvclust but wanted to check I was doing the right thing. I haven't got any error messages but wanted to check my code so that it's doing what I think it's doing!
pv<-pvclust(data2[2:50], method.hclust="average",
method.dist="binary", use.cor="all.obs",
nboot=1000, parallel=FALSE, r=seq(.5,1.4,by=.1),
store=FALSE, weight=FALSE, iseed=NULL, quiet=FALSE)
plot(pv, hang=-1)
Thanks!
pvclust clusters the columns of the matrix so make sure this is what you want, otherwise use t().
ah ok. I think I remember reading that it is the opposite from hclust. I want to cluster the columns based on the rows so I've definitely got the dataframe in the right order. I wondered more about the dist method whether it was ok to use binary and whether pvclust is suitable for a binary dataset.
You can use pvclust for binary data. The binary distance used is the Jaccard distance. Note also that you don't need use.cor since you're not using the correlation distance.
Perfect, thank you. Before writing it up I wanted to check I was using it right! Usually strange error messages are my main problem so doubted myself when there weren't any! Thanks again, Frances