I have a very large binary matrix, stored as a big.matrix to conserve memory (it is over 2 gb otherwise - 5 million columns and 100 rows).
r <- 100 c <- 10000 m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c) m4 <- cbind(m4, 1) m4 <- bigmemory::as.big.matrix(m4)
I need to remove every column which has only one unique value (in this case, only 0s or only 1s). Because of the number of columns, I want to be able to do this in parallel.
How can I accomplish this while keeping the data compressed as a big.matrix? I can convert it into a df and loop over the columns looking for the number of unique values, but this takes too much RAM.
EDIT: It is bioinformatics as each column is actually a protein subsequence. I am running fisher's exact to select important features, but before that, I must remove features that are present in all samples.