I'm running DESeq2 on small RNA sequencing data. I constructed a .csv containing the raw count of each unique small RNA sequence across all my datasets. In the past I've successfully used DESeq2 on similar data, but this time my file size is bigger: my CSV is >2Gb. I'm running into memory errors using x64-bit RStudio on a Windows machine with 64Gb RAM.
This is all I'm trying to do right now:
library(DESeq2) sRNA <- read.csv("deseq_input_all.txt", header=T, row.names=1) coldata <- read.csv("deseq2/coldata.csv", header=T, row.names=1) dds <- DESeqDataSetFromMatrix(countData = sRNA, colData = coldata, design = ~ group) dds <- DESeq(dds)
However, at the DESeq() stage RStudio maxes out the memory and stops. Am I doing something silly with the code above, or is my data too big for this analysis? Is it worthwhile running it on a Linux machine or running R separate from RStudio?
Any tips or advice is appreciated.
EDIT: There are 34 million rows of data.
dim(sRNA) >  34760467 21