I am trying to modify the formate of a big file:
The file is tab-delimited Here how the file looks like:
AB11.1 CB:0078_0.53 CB:0044464_0.42 CB:0005623_0.466 AB10.1 AB01.2 CB:0036_0.4 CB:0003824_0.4 CB:0005575_0.7 CB:0005622_0.2 CB:0005623_0.6 AB01.2 CB:0036_0.3 CB:0003824_0.43 CB:0005575_0.7 CB:0005622_0.1
Please note that the number of columns for each row is not identical. The number of columns can be more than 400 or it can be only 1, and some few rows are empty like for the ID: AB10.1
I want to modify the formate first by removing all characters that come after this symbol
including the symbol itself.
Then modify the separators:
1- Only after the first column it is separated by tab-delimited
2- Starting from the second till the last column they should be separated by a comma and then space
So output file should look like this:
AB11.1 CB:0078, CB:0044464, CB:0005623 AB10.1 AB01.2 CB:0036, CB:0003824, CB:0005575, CB:0005622, CB:0005623 AB01.2 CB:0036, CB:0003824, CB:0005575, CB:0005622
How to do that in a bash script (I have super basic knowledge)? or maybe python (never used it)?