Command to parse mummer coords file
2
1
Entering edit mode
2.0 years ago
el97004 ▴ 40

Hi !

Anyone know some linux commands that can be used to parse through the output coords table of MUMmer? It has the following output format:

    [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  | [TAGS]
=====================================================================================
28536    30543  |     2008        1  |     2008     2008  |    89.71  | 3583 1233
6306     6469  |      972      808  |      164      165  |    92.73  | 3585 1364
6653     7101  |      455        4  |      449      452  |    84.31  | 3585 1364


All I need is the average of the % IDY column but have not been able to get this via linux commands. I also tried converting to an easier format such as txt but that also did not work.

Any ideas? Thank you!

mummer nucmer • 967 views
0
Entering edit mode

All I need is the average of the % IDY column but have not been able to get this via linux commands.

what did you try ?

0
Entering edit mode

Assuming my file is named "coords"

cat coords \
| tail -n +3 \  # remove header section
| cut -f 10 \   # get %IDY column (I think this is where it fails)
| awk '{ total += $2; count++ } END { print total/count }' # average calc  ADD REPLY 2 Entering edit mode 2.0 years ago as it looks your output is a set of fixed-length columns, you'd better try with something like: cut -c 63-70  ADD COMMENT 0 Entering edit mode Thanks! cut -c fixes the issue, I will paste the final working code below: cat coords | tail -n +3 | cut -c 63-71 | awk '{ total +=$1 } END { print total/NR }'

0
Entering edit mode
5 months ago

For anyone coming here later this should do the trick to get a plain tab-delimited table. I tried it on the nucmer coords output from mummer4, which looks similar.

It:

• Removes [, ] and |
• Replaces % with pct_
• Adds _ to columns names that have space in them (LEN and COV like names)
• Removes the "==="-separator
• Replaces multiple spaces with a single tab
• Removes empty lines .

Hopefully that does the trick for you! .

sed 's/$\|$\||//g;
s/% /pct_/g;
s/LEN /LEN_/g;
s/COV /COV_/g;
s/=//g;
s/^ *//g;
s/ \+ /\t/g;
/^ *\$/d;' \
nucmer.coords | tail -n +3