assigning the values in matrix in bash
2
0
Entering edit mode
3.9 years ago

I have a matrix consisting of gene name column and log fold change column. I want to change the log fold change values in -1,1 and 0. if log fold change > 0, it should be 1; if logFC < 0, it should be -1 and if there is no values then it should be NA. Please tell me how to assign these values -1,1,0,NA in these matrix? I am comfortable in bash so please tell me in bash only. Thanks in advance.

microarray bash • 985 views
ADD COMMENT
0
Entering edit mode

While this is doable in bash, arithmetic is really not its strong suit, so I would really advise you to get comfortable with a more versatile language such as R or python.

Please also provide some example input data for people to test with at the very least.

ADD REPLY
0
Entering edit mode
A3GALT2 9.80e-02  0.295935  3.58e-02
A4GALT 5.58e-02  0.2759222  4.21e-01
A4GNT -5.50e-03  -1.1805802  2.09e-01
AAAS -4.29e-01  -0.122598  2.22e-01
AACS -1.82e-02  -0.0618869  8.14e-02
AADAC 3.22e-02  0.6967785  -4.37e-01
AADACL2 2.97e-02  -1.8886345  -9.67e-03
AADACL3 -1.26e-01  2.3524335  -3.17e-02

This is the test file.

ADD REPLY
0
Entering edit mode

Please provide me the bash script for scientific notation Joe. I understood your logic.

ADD REPLY
1
Entering edit mode

I've given you a functional skeleton, it shouldn't be hard for you to adapt it to scientific notation.

ADD REPLY
0
Entering edit mode

But i didnt get the thing you said that the E would need to be substituted for a *10^ string. I am saying about this. What to add in the code?

ADD REPLY
0
Entering edit mode

You should experiment for yourself and try things out. The are answers on StackOverflow for this.

But as I said, you'd be better off in a different language.

ADD REPLY
2
Entering edit mode
3.9 years ago
Joe 21k

Question needs more information (such as the format of the input data etc.) as per my comment, but here's the basics of something functional:

#!/bin/bash

while IFS=',' read -r -a array ; do
  if [ -z "${array[1]}" ] ; then
    fc="NA"
  elif (( "${array[1]}" > 0 )); then
    fc=1
  elif (( "${array[1]}" < 0 )); then
    fc=0
  fi
  echo "${array[0]},$fc"
done < $1

Assuming an input file like:

$ cat test.csv
Gene1,1
Gene2,10
Gene3,
Gene4,-1
Gene5,-30

bash scriptname.sh test.csv will yield:

Gene1,1
Gene2,1
Gene3,NA
Gene4,0
Gene5,0

NOTE:

bash cannot do floating point arithmetic (unless you subprocess to bc or something), so if you have floating point data, this is why I would strongly urge the use of Python/R or something as per my comment also. Additionally this is why it's important to provide example input data.

Floating point calculations would require something like the following:

#!/bin/bash

while IFS=',' read -r -a array ; do
  if [ -z "${array[1]}" ] ; then
    fc="NA"
  elif (( $(echo "${array[1]} > 0" | bc)  )); then
    fc=1
  elif (( $(echo "${array[1]} < 0" | bc) )); then
    fc=0
  fi
  echo "${array[0]},$fc"
done < $1

Which for the input:

Gene1,1
Gene2,10
Gene3,
Gene4,-1
Gene5,-30
Gene6,1.234
Gene7,0.999999999999999999999999999999999
Gene8,-0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001

will give:

Gene1,1
Gene2,1
Gene3,NA
Gene4,0
Gene5,0
Gene6,1
Gene7,1
Gene8,0

As a final comment, this will still not work for standard scientific notation for exponents (1E+10 etc.). In order for that to work with bc, the E would need to be substituted for a *10^ string.

ADD COMMENT
1
Entering edit mode
3.9 years ago
$ echo -e "gene1 100.2\ngene2 -100.4\ngene3" | awk '{OFS="\t";V=($2==""?"NA":($2>=0?1:-1)); print $1,V;}' 
gene1   1
gene2   -1
gene3   NA
ADD COMMENT

Login before adding your answer.

Traffic: 2089 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6