Command to parse mummer coords file
2
1
Entering edit mode
4.4 years ago
el97004 ▴ 80

Hi !

Anyone know some linux commands that can be used to parse through the output coords table of MUMmer? It has the following output format:

    [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  | [TAGS]
=====================================================================================
   28536    30543  |     2008        1  |     2008     2008  |    89.71  | 3583 1233
    6306     6469  |      972      808  |      164      165  |    92.73  | 3585 1364
    6653     7101  |      455        4  |      449      452  |    84.31  | 3585 1364

All I need is the average of the % IDY column but have not been able to get this via linux commands. I also tried converting to an easier format such as txt but that also did not work.

Any ideas? Thank you!

mummer nucmer • 2.1k views
ADD COMMENT
0
Entering edit mode

All I need is the average of the % IDY column but have not been able to get this via linux commands.

what did you try ?

ADD REPLY
0
Entering edit mode

Assuming my file is named "coords"

cat coords \
| tail -n +3 \  # remove header section
| cut -f 10 \   # get %IDY column (I think this is where it fails)
| awk '{ total += $2; count++ } END { print total/count }' # average calc
ADD REPLY
2
Entering edit mode
4.4 years ago

as it looks your output is a set of fixed-length columns, you'd better try with something like:

cut -c 63-70
ADD COMMENT
0
Entering edit mode

Thanks! cut -c fixes the issue, I will paste the final working code below:

cat coords | tail -n +3 | cut -c 63-71 | awk '{ total += $1 } END { print total/NR }'
ADD REPLY
0
Entering edit mode
2.9 years ago

For anyone coming here later this should do the trick to get a plain tab-delimited table. I tried it on the nucmer coords output from mummer4, which looks similar.

It:

  • Removes [, ] and |
  • Replaces % with pct_
  • Adds _ to columns names that have space in them (LEN and COV like names)
  • Removes the "==="-separator
  • Removes leading blank space
  • Replaces multiple spaces with a single tab
  • Removes empty lines .

Hopefully that does the trick for you! .

sed 's/\[\|\]\||//g;
s/% /pct_/g;
s/LEN /LEN_/g; 
s/COV /COV_/g; 
s/=//g;
s/^ *//g;
s/ \+ /\t/g;
/^ *$/d;' \
nucmer.coords | tail -n +3
ADD COMMENT

Login before adding your answer.

Traffic: 2840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6