Question: Struggling to work on a large Count Matrix
0
gravatar for castravete2712
12 days ago by
castravete27120 wrote:

Hello there,

I'm struggling loading/importing my large count Matrix on Rstudio in order to analyze it. It's a quite medium sized data (3 Giga) but R is crashing every time and my PC want to seppuku itself each time.

So, importing by only making a read.table won't work. I tried stocking it as a big.matrix file, it won't work either, R crashes again.

What can I do? I can't find any nice tutorial for this kind of problem.

countmatrix R single-cell • 193 views
ADD COMMENTlink modified 12 days ago by Jean-Karim Heriche21k • written 12 days ago by castravete27120
2

Is it crashing due to running out of RAM? Have you tried a sparse matrix?

ADD REPLYlink written 12 days ago by Devon Ryan94k

Yep, the RAM can't keep up.

I was thinking of that but I can't manage to read the file directly into a sparseMatrix, avoiding the read.table step. read.matrix maybe?

ADD REPLYlink modified 12 days ago • written 12 days ago by castravete27120
2

How about doing that sequentially, like in chunks of 10%?

ADD REPLYlink written 12 days ago by ATpoint29k
3

By the way, don't bother with read.table, it is super slow. Use for example (among many good options) data.table::fread() or readr::readr(). Speed gains are notable.

ADD REPLYlink modified 12 days ago • written 12 days ago by ATpoint29k
1

Might be a job for {disk.frame} https://github.com/xiaodaigh/disk.frame

ADD REPLYlink written 12 days ago by russhh5.1k

Do you mean "load/import" a big file?

charging my large count Matrix on R

ADD REPLYlink written 12 days ago by zx87549.0k

Yeah, sorry, I was indeed meaning to say to import or load data

ADD REPLYlink written 12 days ago by castravete27120

I struggle to see how this is related to bioinformatics, or why it has attracted so many answers. Loads of questions get killed for asking something about biology and maybe tangentially related to bioinformatics. I don't see how this question is related to either.

ADD REPLYlink written 12 days ago by Mensur Dlakic3.4k
2

Dealing with large data sets has become a more common issue although it is not specific to bioinformatics. However, for bioinformatics data types, there may exist specific tools. Here we're dealing with a count matrix and although replies currently suggest generic solutions, maybe someone has a more specific solution for count matrices as part of their analysis pipeline that they can share.

ADD REPLYlink written 11 days ago by Jean-Karim Heriche21k

Agreed. The single-cell packages are starting to output counts in sparse matrices inside hdf5 containers for this reason, so if one could go back a step in OP's workflow there are likely some tweaks that could be made there to make life easier.

ADD REPLYlink written 11 days ago by Devon Ryan94k

Maybe I am missing your point. Do you see anything in the question that implies biology or bioinformatics application of what this poster is trying to do?

My point was that lots of posters are turned away even though they sometimes have legitimate biology question that may be related to bioinformatics. To me, that is closer to the intended purpose of this site than the current post.

ADD REPLYlink written 11 days ago by Mensur Dlakic3.4k

Your point is valid, but as it does not add to the content of this thread I suggest we discuss things like that in our Slack, which you are invited to join:

biostar.slack.com: Chat for the biostars community -- [ feel free to join ]

ADD REPLYlink written 11 days ago by ATpoint29k

Well, as it wasn't really important to say why I was needing it, I didn't mention it. But I need to find what I have to do in order to import this complex and large data in R because I need to analyze a large count Matrix issued of single-cell sequencing (split-seq if I want to be precise).

The count matrices issued of the pipelines that analyze the single-cell raw data are huge and in conclusion, R has troubles working on them and needs a lot of RAM.

So, I was just trying to ask around as I can't really find the right method that will help me use R with such large files and I want to do it properly

ADD REPLYlink modified 10 days ago • written 10 days ago by castravete27120
2
gravatar for Jean-Karim Heriche
12 days ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

Two R packages that may be of interest:

ADD COMMENTlink written 12 days ago by Jean-Karim Heriche21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour