5 weeks ago by

Belgium, Brussels

Could you put the command you tried ? Do you want to compute correlation between samples ? or between genes ? also which species has 1,000,000 genes ?

For correlation between samples :

```
# generate test dataset - 40 samples x 1,000,000 genes
m <- matrix(runif(40e6,min = 0,max=100),nrow = 1000000,ncol = 40)
m <- as.data.frame(m)
colnames(m)<-paste0("sample",1:40)
row.names(m)<-paste0("gene",1:1000000)
# compute correlation
cor.res <- cor(m)
```

For gene-gene correlation you will have to generate a 1,000,000 x 1,000,000 matrix that will be quiet big in memory ..

```
# Example of 1M x 1M matrix in R
m <- matrix(0,ncol=1e6,nrow=1e6)
Error: cannot allocate vector of size 7450.6 Gb
```

Maybe you could try to find a solution by using the bigmemory or ff packages.

In fact someone already implemented a solution based on ff.

are you clustering the genes or the samples?

4.2kI am clustering genes

50