Question needs more information (such as the format of the input data etc.) as per my comment, but here's the basics of something functional:
#!/bin/bash
while IFS=',' read -r -a array ; do
if [ -z "${array[1]}" ] ; then
fc="NA"
elif (( "${array[1]}" > 0 )); then
fc=1
elif (( "${array[1]}" < 0 )); then
fc=0
fi
echo "${array[0]},$fc"
done < $1
Assuming an input file like:
$ cat test.csv
Gene1,1
Gene2,10
Gene3,
Gene4,-1
Gene5,-30
bash scriptname.sh test.csv
will yield:
Gene1,1
Gene2,1
Gene3,NA
Gene4,0
Gene5,0
NOTE:
bash
cannot do floating point arithmetic (unless you subprocess to bc
or something), so if you have floating point data, this is why I would strongly urge the use of Python/R or something as per my comment also. Additionally this is why it's important to provide example input data.
Floating point calculations would require something like the following:
#!/bin/bash
while IFS=',' read -r -a array ; do
if [ -z "${array[1]}" ] ; then
fc="NA"
elif (( $(echo "${array[1]} > 0" | bc) )); then
fc=1
elif (( $(echo "${array[1]} < 0" | bc) )); then
fc=0
fi
echo "${array[0]},$fc"
done < $1
Which for the input:
Gene1,1
Gene2,10
Gene3,
Gene4,-1
Gene5,-30
Gene6,1.234
Gene7,0.999999999999999999999999999999999
Gene8,-0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
will give:
Gene1,1
Gene2,1
Gene3,NA
Gene4,0
Gene5,0
Gene6,1
Gene7,1
Gene8,0
As a final comment, this will still not work for standard scientific notation for exponents (1E+10
etc.). In order for that to work with bc
, the E
would need to be substituted for a *10^
string.
While this is doable in bash, arithmetic is really not its strong suit, so I would really advise you to get comfortable with a more versatile language such as R or python.
Please also provide some example input data for people to test with at the very least.
This is the test file.
Please provide me the bash script for scientific notation Joe. I understood your logic.
I've given you a functional skeleton, it shouldn't be hard for you to adapt it to scientific notation.
But i didnt get the thing you said that the E would need to be substituted for a *10^ string. I am saying about this. What to add in the code?
You should experiment for yourself and try things out. The are answers on StackOverflow for this.
But as I said, you'd be better off in a different language.