I have a matrix consisting of gene name column and log fold change column. I want to change the log fold change values in -1,1 and 0. if log fold change > 0, it should be 1; if logFC < 0, it should be -1 and if there is no values then it should be NA. Please tell me how to assign these values -1,1,0,NA in these matrix? I am comfortable in bash so please tell me in bash only. Thanks in advance.
Question needs more information (such as the format of the input data etc.) as per my comment, but here's the basics of something functional:
#!/bin/bash
while IFS=',' read -r -a array ; do
if [ -z "${array[1]}" ] ; then
fc="NA"
elif (( "${array[1]}" > 0 )); then
fc=1
elif (( "${array[1]}" < 0 )); then
fc=0
fi
echo "${array[0]},$fc"
done < $1
Assuming an input file like:
$ cat test.csv
Gene1,1
Gene2,10
Gene3,
Gene4,-1
Gene5,-30
bash scriptname.sh test.csv
will yield:
Gene1,1
Gene2,1
Gene3,NA
Gene4,0
Gene5,0
NOTE:
bash
cannot do floating point arithmetic (unless you subprocess to bc
or something), so if you have floating point data, this is why I would strongly urge the use of Python/R or something as per my comment also. Additionally this is why it's important to provide example input data.
Floating point calculations would require something like the following:
#!/bin/bash
while IFS=',' read -r -a array ; do
if [ -z "${array[1]}" ] ; then
fc="NA"
elif (( $(echo "${array[1]} > 0" | bc) )); then
fc=1
elif (( $(echo "${array[1]} < 0" | bc) )); then
fc=0
fi
echo "${array[0]},$fc"
done < $1
Which for the input:
Gene1,1
Gene2,10
Gene3,
Gene4,-1
Gene5,-30
Gene6,1.234
Gene7,0.999999999999999999999999999999999
Gene8,-0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
will give:
Gene1,1
Gene2,1
Gene3,NA
Gene4,0
Gene5,0
Gene6,1
Gene7,1
Gene8,0
As a final comment, this will still not work for standard scientific notation for exponents (1E+10
etc.). In order for that to work with bc
, the E
would need to be substituted for a *10^
string.
$ echo -e "gene1 100.2\ngene2 -100.4\ngene3" | awk '{OFS="\t";V=($2==""?"NA":($2>=0?1:-1)); print $1,V;}'
gene1 1
gene2 -1
gene3 NA
While this is doable in bash, arithmetic is really not its strong suit, so I would really advise you to get comfortable with a more versatile language such as R or python.
Please also provide some example input data for people to test with at the very least.
This is the test file.
Please provide me the bash script for scientific notation Joe. I understood your logic.
I've given you a functional skeleton, it shouldn't be hard for you to adapt it to scientific notation.
But i didnt get the thing you said that the E would need to be substituted for a *10^ string. I am saying about this. What to add in the code?
You should experiment for yourself and try things out. The are answers on StackOverflow for this.
But as I said, you'd be better off in a different language.