Split up a matrix into chunks considering the values of a column in Bash
0
0
Entering edit mode
3.9 years ago
r.tor ▴ 50

I want to split up a matrix called 'matrix' into chunks based on the values in the first column, 'GENE', and save each chunk as a separate .gz file. So that, there would be subsets of the matrix, each of which will have the lines corresponding to the only 3 GENEs, just not the last one as shown in the example below. The script should be prepared in Bash.

Input:

> matrix
GENE Individual Expr1 Expr2 Expr3
ENSG1 indv1 0.1 0.2 0.3
ENSG1 indv2 0.1 0.2 0.3
ENSG2 indv1 0.1 0.2 0.3
ENSG2 indv2 0.1 0.2 0.3
ENSG3 indv1 0.1 0.2 0.3
ENSG3 indv2 0.1 0.2 0.3
ENSG4 indv1 0.1 0.2 0.3
ENSG4 indv2 0.1 0.2 0.3
ENSG5 indv1 0.1 0.2 0.3
ENSG5 indv2 0.1 0.2 0.3
ENSG6 indv1 0.1 0.2 0.3
ENSG6 indv2 0.1 0.2 0.3
ENSG7 indv1 0.1 0.2 0.3
ENSG7 indv2 0.1 0.2 0.3
ENSG8 indv1 0.1 0.2 0.3
ENSG8 indv2 0.1 0.2 0.3
ENSG9 indv1 0.1 0.2 0.3
ENSG9 indv2 0.1 0.2 0.3
ENSG10 indv1 0.1 0.2 0.3
ENSG10 indv2 0.1 0.2 0.3

Outputs:

> matrix.chunk1
GENE Individual Expr1 Expr2 Expr3
ENSG1 indv1 0.1 0.2 0.3
ENSG1 indv2 0.1 0.2 0.3
ENSG2 indv1 0.1 0.2 0.3
ENSG2 indv2 0.1 0.2 0.3
ENSG3 indv1 0.1 0.2 0.3
ENSG3 indv2 0.1 0.2 0.3

> matrix.chunk2
GENE Individual Expr1 Expr2 Expr3
ENSG4 indv1 0.1 0.2 0.3
ENSG4 indv2 0.1 0.2 0.3
ENSG5 indv1 0.1 0.2 0.3
ENSG5 indv2 0.1 0.2 0.3
ENSG6 indv1 0.1 0.2 0.3
ENSG6 indv2 0.1 0.2 0.3

> matrix.chunk3
GENE Individual Expr1 Expr2 Expr3
ENSG7 indv1 0.1 0.2 0.3
ENSG7 indv2 0.1 0.2 0.3
ENSG8 indv1 0.1 0.2 0.3
ENSG8 indv2 0.1 0.2 0.3
ENSG9 indv1 0.1 0.2 0.3
ENSG9 indv2 0.1 0.2 0.3

> matrix.chunk4
GENE Individual Expr1 Expr2 Expr3
ENSG10 indv1 0.1 0.2 0.3
ENSG10 indv2 0.1 0.2 0.3

I would appreciate any suggestion.

bash shell • 806 views
ADD COMMENT
1
Entering edit mode

I'm not providing the code, but here is what you can do

prepare a list object where each element of the list contains gene names, e.g.

gene_names_list
$chunk1
ENSG1
ENSG2
ENSG3

$chunk2
ENSG4
ENSG5
...

Loop over this list object and collect your matrix chunks by matching list contents (ie ENSG1, 2...) with original matrix and save it to a file

ADD REPLY

Login before adding your answer.

Traffic: 1878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6