I need to divide each column in a data text file by the first number in each column and then print the divided columns into a new file for the purposes of analyzing some calcium imaging data.
For example, if I have a file, data1.txt:
0.269279025887517 0.264783938118797 0.275486451749212
0.270740157930877 0.253012512397955 0.280354009308003
0.265142481116960 0.258302433813993 0.280007507438773
0.265476939803159 0.261876223900715 0.278089570458534
0.267494373235676 0.265344090943771 0.274431860837720
0.275140192263676 0.261608300907912 0.275908445868620
0.271029888609140 0.267924518705018 0.276844983596552
0.269824902723735 0.267923628595407 0.272863416495003
0.271370355535210 0.266164772004781 0.272643808651865
I need an awk
script to give me a resulting file data2.txt:
1.000000000000000 1.000000000000000 1.000000000000000
1.005426089308460 0.955543278778637 1.017668954418210
0.984638444242275 0.975521535215266 1.016411172530820
0.985880496738182 0.989018540026482 1.009449171430370
0.993372478060781 1.002115509078660 0.996171895551321
1.021766145197690 0.988006684871269 1.001531814420380
1.006502038975570 1.011860918032010 1.004931392591950
1.002027179184930 1.011857556386980 0.990478532655403
1.007766403791760 1.005214945799940 0.989681368795825
The number of columns can be as many as 50, and the number of rows also varies and I also need it printed out to 15 decimal places and tab delimited. So far I have:
awk 'BEGIN{first_line=0;divide_by=1;}{if(first_line==0){first_line=1; divide_by=$2;}print $2/divide_by;}END{}' data1.txt > data2.txt
which only gives me the second column and gives the resulting numbers to only 6 decimal places. I need ithe script to do that for all of the columns in any given data file.
Context: I'm trying to analyze data from cells imaged using a fluorescent calcium dye and need to normalize the data columns by the average of their baseline (the first number).)
This does seem like it is what I am after, but only gives me the first column.
Check that your input is tab-delimited, since the use of
split
here is expecting to work on tab-delimited input.One way to do this is to use
cat -te
; for example:Tabs should be shown between numbers as
^I
and lines should be terminated with$
characters.If that's not the case for you, then you need to clean or reformat your input. For example, using
tr
orsed
orawk
, etc.If you're using OS X, also make sure you're using GNU
awk
. One way to do this is to install Homebrew and then runbrew install gawk
. Replace use ofawk
withgawk
to get an authentically GNUishawk
experience.If you're using OS X and want to clean the input file, the BSD version of
sed
with OS X behaves differently from what is shipped in Linux. I have a blog post about cleaning up multiple spaces with single tabs with BSDsed
.This is really great, thank you Alex. Still struggling to get the columns tab delimited, but I'm coming down the home stretch.
Cleaning input seems to be 95% of bioinformatics. Good luck and let us know how it goes.