I'm trying to cluster RNA-seq data using pvclust function from pvclust package, it gives me this error: cannot allocate vector of length 1623767616 I'm wondering if this is because I have 40296 genes and its too much data?
My code is this:
test2<-read.csv("RNAseq_to_cluster.csv", sep=",") test3<-test2[,2:4] #columns contain samples row.names(test3)<-test2$gene matrix<-data.matrix(test3) transpose= t(matrix) pv <- pvclust(transpose, method.dist="correlation", method.hclust="average", nboot=1000) Error in cor(x, method = "pearson", use = use.cor) : cannot allocate vector of length 1623767616
EDIT: first few lines of the input file:
gene sample1 sample2 sample3 Mar-01 4.19504 3.9006 4.15683 Mar-02 3.0554 3.4261 3.76675 Sep-02 77.1536 65.1284 76.4927 Mar-03 1.01555 1.28626 0.461987
I don't think you need to do much to your data input to run the pvclust function. The transposition of the data matrix might be the problem. Instead of finding pair-wise correlation for just 3 sets of data (sample1,2,3), the transposition might be telling pvclust to do it for 40,000 sets of data (genes).
Try just this:
data = as.matrix(read.csv('RNAseq_to_cluster.csv',sep=',',header=TRUE, row.name = 1)) pv <- pvclust(data, method.dist="correlation", method.hclust="average", nboot=1000)