I'm trying to cluster RNA-seq data using pvclust function from pvclust package, it gives me this error: cannot allocate vector of length 1623767616 I'm wondering if this is because I have 40296 genes and its too much data?
My code is this:
test2<-read.csv("RNAseq_to_cluster.csv", sep=",") test3<-test2[,2:4] #columns contain samples row.names(test3)<-test2$gene matrix<-data.matrix(test3) transpose= t(matrix) pv <- pvclust(transpose, method.dist="correlation", method.hclust="average", nboot=1000) Error in cor(x, method = "pearson", use = use.cor) : cannot allocate vector of length 1623767616
EDIT: first few lines of the input file:
gene sample1 sample2 sample3 Mar-01 4.19504 3.9006 4.15683 Mar-02 3.0554 3.4261 3.76675 Sep-02 77.1536 65.1284 76.4927 Mar-03 1.01555 1.28626 0.461987
Yeah there isn't enough memory to make a vector of that size. But I don't see why it would need to make a vector of that size for what you are doing. Can you post the first few lines of the csv input file?
I've posted a few lines of the input file
Try repeating with less number of genes, to get an answer. I assume, you have reached the R memory limit of 4GB. Check this post and post for possible workarounds.
Statistically it's not a great idea to blow up a 40k × 3 dataset into a 40k × 40k correlation matrix