Question: Struggling to work on a large Count Matrix
0
gravatar for castravete2712
7 months ago by
castravete27120 wrote:

Hello there,

I'm struggling loading/importing my large count Matrix on Rstudio in order to analyze it. It's a quite medium sized data (3 Giga) but R is crashing every time and my PC want to seppuku itself each time.

So, importing by only making a read.table won't work. I tried stocking it as a big.matrix file, it won't work either, R crashes again.

What can I do? I can't find any nice tutorial for this kind of problem.

countmatrix R single-cell • 315 views
ADD COMMENTlink modified 7 months ago by Jean-Karim Heriche23k • written 7 months ago by castravete27120
2

Is it crashing due to running out of RAM? Have you tried a sparse matrix?

ADD REPLYlink written 7 months ago by Devon Ryan96k

Yep, the RAM can't keep up.

I was thinking of that but I can't manage to read the file directly into a sparseMatrix, avoiding the read.table step. read.matrix maybe?

ADD REPLYlink modified 7 months ago • written 7 months ago by castravete27120
2

How about doing that sequentially, like in chunks of 10%?

ADD REPLYlink written 7 months ago by ATpoint39k
3

By the way, don't bother with read.table, it is super slow. Use for example (among many good options) data.table::fread() or readr::readr(). Speed gains are notable.

ADD REPLYlink modified 7 months ago • written 7 months ago by ATpoint39k
1

Might be a job for {disk.frame} https://github.com/xiaodaigh/disk.frame

ADD REPLYlink written 7 months ago by russhh5.5k

Do you mean "load/import" a big file?

charging my large count Matrix on R

ADD REPLYlink written 7 months ago by zx87549.6k

Yeah, sorry, I was indeed meaning to say to import or load data

ADD REPLYlink written 7 months ago by castravete27120

I struggle to see how this is related to bioinformatics, or why it has attracted so many answers. Loads of questions get killed for asking something about biology and maybe tangentially related to bioinformatics. I don't see how this question is related to either.

ADD REPLYlink written 7 months ago by Mensur Dlakic6.7k
2

Dealing with large data sets has become a more common issue although it is not specific to bioinformatics. However, for bioinformatics data types, there may exist specific tools. Here we're dealing with a count matrix and although replies currently suggest generic solutions, maybe someone has a more specific solution for count matrices as part of their analysis pipeline that they can share.

ADD REPLYlink written 7 months ago by Jean-Karim Heriche23k

Agreed. The single-cell packages are starting to output counts in sparse matrices inside hdf5 containers for this reason, so if one could go back a step in OP's workflow there are likely some tweaks that could be made there to make life easier.

ADD REPLYlink written 7 months ago by Devon Ryan96k

Maybe I am missing your point. Do you see anything in the question that implies biology or bioinformatics application of what this poster is trying to do?

My point was that lots of posters are turned away even though they sometimes have legitimate biology question that may be related to bioinformatics. To me, that is closer to the intended purpose of this site than the current post.

ADD REPLYlink written 7 months ago by Mensur Dlakic6.7k

Your point is valid, but as it does not add to the content of this thread I suggest we discuss things like that in our Slack, which you are invited to join:

biostar.slack.com: Chat for the biostars community -- [ feel free to join ]

ADD REPLYlink written 7 months ago by ATpoint39k

Well, as it wasn't really important to say why I was needing it, I didn't mention it. But I need to find what I have to do in order to import this complex and large data in R because I need to analyze a large count Matrix issued of single-cell sequencing (split-seq if I want to be precise).

The count matrices issued of the pipelines that analyze the single-cell raw data are huge and in conclusion, R has troubles working on them and needs a lot of RAM.

So, I was just trying to ask around as I can't really find the right method that will help me use R with such large files and I want to do it properly

ADD REPLYlink modified 7 months ago • written 7 months ago by castravete27120
2
gravatar for Jean-Karim Heriche
7 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

Two R packages that may be of interest:

ADD COMMENTlink written 7 months ago by Jean-Karim Heriche23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 979 users visited in the last hour