I'm an R beginner trying to learn how to analyze single-cell seq data using Seurat tools. I wanted to try to work with a published Drop-seq UMI count dataset available via GEO:
I've encountered two errors trying to load this data into R:
1. Using data <- read.table(file = /path/to/file, sep = '\t')
results in a memory error Cannot allocate vector of size 125 Kb
. To work around this I've tried to use memory.limit()
to try to get around this (I have 8Gb of RAM) but R always crashes.
- To get around the memory issue another way I've tried to use
read.table.ffdf
using various combinations of parameters i.e.row.names = 1
,header = TRUE
but each results in an error. i.e.attempt to set 'rownames' on an object with no dimensions
andmore columns than column names
.
I think the issue comes down to the fact that I do not know what this data file looks like and because it is very large data file (~4Gb) I haven't been able to open it to view it myself, even using LTFviewer. So does anybody have any tips on how to load in large single-cell seq UMI count files for use in Seurat pipeline? Would using read.table.ffdf
work if I found the correct parameters to load in the file or is there a better way to go about this all together?
Thank you!
Maybe this helps:
data.tables
tend to be a bit more manageable for memory issues.I've put the results of the following lines of code here, let me know if that works for you.
To be precise -- the link above allows you to download a tar archive. You probably need to untar it. Then open up R and read in the three files from that tar archive with any function meant to read in 10X CellRanger data, e.g.
DropletUtils::read10XCounts()
:This worked perfectly and avoided the memory issues I was having! Thank you for the help!