How to a determine and output if numbers in matrix are above a certain value?
1
0
Entering edit mode
2.7 years ago
YL ▴ 10

Hello,

I am working on a matrix with rows as 3000 individuals and columns as coverage depth of millions of sites

Usually I would use R to process this dataset. But it is too big that R cannot take it

For each value in this huge matrix, I need to know whether it is >=10 or <10. I would hope to return a matrix with rows as samples and columns as site but each cell with value of 1 if the depth >=10 and value of 0 if depth < 0. I wonder how I could do this?

Thank you very much!

Unix depth matrix coverage line command • 924 views
ADD COMMENT
2
Entering edit mode
2.7 years ago

You could use awk. Starting with an example matrix:

% echo -e '10\t11\t1\t9\t10\n1\t2\t14\t12\t99' > matrix.txt
% cat matrix.txt
10  11  1   9   10
1   2   14  12  99

Then you can threshold it like so:

% awk -v FS="\t" -v OFS="\t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { ($i >= THRESHOLD) ? $i = 1 : $i = 0; } print $0; }' matrix.txt
1   1   0   0   1
0   0   1   1   1

To write it to a file:

% awk -v FS="\t" -v OFS="\t" -v THRESHOLD=10 '{ for (i=1; i<=NF; i++) { ($i >= THRESHOLD) ? $i = 1 : $i = 0; } print $0; }' matrix.txt > answer.txt
% cat answer.txt
1   1   0   0   1
0   0   1   1   1

I'm assuming a typo in your question, because of the gap in conditions between the 0 and 10 cases.

ADD COMMENT
0
Entering edit mode

Thank you so much!!

ADD REPLY

Login before adding your answer.

Traffic: 2457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6