how to separate a large matrix into different small matrix with different rows?
1
0
Entering edit mode
2.4 years ago
ROSE • 0

Hi, I have a big matrix (peak by cell) that is more than 75Gb, and I could not open it by Scanpy or in R environment, so I want to separate them into 20 matrix by various rows ( 1-20000, 20000-32300, 32300-33300.....) in linux.

The way I know is "split" in linux, but it just separates file into smaller one which is the same size or the same column, right?

So could you please tell me whether there is way can help me? Thank you.

separate matrix large • 1.1k views
ADD COMMENT
0
Entering edit mode
2.4 years ago

where are the ranges stored? Let us say ranges are stored in a text like this:

$ cat ranges.txt
1-20000
20000-323000
....
....

matrix file name is matrix.txt.

$ parallel --plus --col-sep "-" sed -n '{1},{2}p' matrix.txt ">" new_{1}_{2}.txt :::: ranges.txt

Make sure that system has enough space. Parallel supports both cores and cpus. Use accordingly. Please run the script on an example file before you execute the command, on the bigger file.

ADD COMMENT
0
Entering edit mode

Thank you for helping me. So sorry I did not describe my data clearly. It is a sparse matrix with 1323041 rows and 1154611 columns, but there are no row names and column names. And it may look like this below

enter image description here

"." means there is no value. I want to separate the matrix by the number of rows. For example, the first smaller file I want is the data from ROW1 to ROW20000, containing the first 20000 rows data in original large matrix. And the second file would be from ROW20000 to ROW32300.

it will be better if the last smaller files still are sparse matrix format.

ADD REPLY

Login before adding your answer.

Traffic: 2257 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6