Question

how to separate a large matrix into different small matrix with different rows？

0

Entering edit mode

2.4 years ago

ROSE • 0

Hi, I have a big matrix (peak by cell) that is more than 75Gb, and I could not open it by Scanpy or in R environment, so I want to separate them into 20 matrix by various rows ( 1-20000, 20000-32300, 32300-33300.....) in linux.

The way I know is "split" in linux, but it just separates file into smaller one which is the same size or the same column, right?

So could you please tell me whether there is way can help me? Thank you.

separate matrix large • 1.1k views

ADD COMMENT • link 2.4 years ago by ROSE • 0

score 0 · Answer 1 · 2021-12-21

0

Entering edit mode

2.4 years ago

cpad0112 21k

where are the ranges stored? Let us say ranges are stored in a text like this:

$ cat ranges.txt
1-20000
20000-323000
....
....

matrix file name is matrix.txt.

$ parallel --plus --col-sep "-" sed -n '{1},{2}p' matrix.txt ">" new_{1}_{2}.txt :::: ranges.txt

Make sure that system has enough space. Parallel supports both cores and cpus. Use accordingly. Please run the script on an example file before you execute the command, on the bigger file.

ADD COMMENT • link 2.4 years ago by cpad0112 21k

0

Entering edit mode

Thank you for helping me. So sorry I did not describe my data clearly. It is a sparse matrix with 1323041 rows and 1154611 columns, but there are no row names and column names. And it may look like this below

enter image description here

"." means there is no value. I want to separate the matrix by the number of rows. For example, the first smaller file I want is the data from ROW1 to ROW20000, containing the first 20000 rows data in original large matrix. And the second file would be from ROW20000 to ROW32300.

it will be better if the last smaller files still are sparse matrix format.

ADD REPLY • link 2.4 years ago by ROSE • 0