How to a determine and output if numbers in matrix are above a certain value?
1
0
Entering edit mode
3 months ago
Yiming ▴ 10

Hello,

I am working on a matrix with rows as 3000 individuals and columns as coverage depth of millions of sites

Usually I would use R to process this dataset. But it is too big that R cannot take it

For each value in this huge matrix, I need to know whether it is >=10 or <10. I would hope to return a matrix with rows as samples and columns as site but each cell with value of 1 if the depth >=10 and value of 0 if depth < 0. I wonder how I could do this?

Thank you very much!

Unix depth matrix coverage line command • 395 views
2
Entering edit mode
3 months ago

You could use awk. Starting with an example matrix:

% echo -e '10\t11\t1\t9\t10\n1\t2\t14\t12\t99' > matrix.txt
% cat matrix.txt
10  11  1   9   10
1   2   14  12  99


Then you can threshold it like so:

% awk -v FS="\t" -v OFS="\t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { ($i >= THRESHOLD) ?$i = 1 : $i = 0; } print$0; }' matrix.txt
1   1   0   0   1
0   0   1   1   1


To write it to a file:

% awk -v FS="\t" -v OFS="\t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { ($i >= THRESHOLD) ?$i = 1 : $i = 0; } print$0; }' matrix.txt > answer.txt
1   1   0   0   1
0   0   1   1   1


I'm assuming a typo in your question, because of the gap in conditions between the 0 and 10 cases.

0
Entering edit mode

Thank you so much!!