Question: Sed Remove Boostraps From Mrbayes Trees
0
gravatar for Louis
7.3 years ago by
Louis50
Louis50 wrote:

Hello again Biostars,

I have what should be a simple sed question but I'm having some difficulty. Please take a look and let me know if you have a solution. I'm trying to use sed to remove bootstrap values from a nexus tree file which was created in MrBayes. Apparently, the newest version of MrBayes does not allow you to omit these bootstraps from your output. This is proving problematic since the next program in my pipeline is having difficulty parsing those bootstraps. So given the following, I would like to remove the value between all instances of ":" and ",". However, I want to conserve the ",". The sed edited solution should read as below. My attempt thus far have been too greedy such as sed -e 's/:.*,//g'. Thanks!

  72 BE982029,
  73 RIMD,
  74 TX2103,
  75 3631,
  76 3646,
  77 T3937;
  tree gen.0 =  (28:1.000000000000000e-01,((33:1.000000000000000e-01,(((64:1.000000000000000e-01,54:1.000000000000000e-01):1.000000000000000e-01,35:1.000000000000000e-01):1.000000000000000e-01,(((61:1.000000000000000e-01,55:1.000000000000000e-01):1.000000000000000e-01,47:1.000000000000000e-01):1.000000000000000e-01,((31:1.000000000000000e-01,(77:1.000000000000000e-01,30:1.000000000000000e-01):1.000000000000000e-01)

  72 BE982029,
  73 RIMD,
  74 TX2103,
  75 3631,
  76 3646,
  77 T3937;
  tree gen.0 =  (28,((33,(((64,54),35),(((61,55),47),((31,(77,30))
• 1.3k views
ADD COMMENTlink modified 7.3 years ago by Andreas2.4k • written 7.3 years ago by Louis50
2
gravatar for Andreas
7.3 years ago by
Andreas2.4k
Singapore
Andreas2.4k wrote:

Use

sed -e 's/:[0-9\.e\-]\+//g'

Short explanation: substitute everything starting with a colon followed by anything matching a number ([0-9]), a dot (escaped, otherwise this means "any") an e or a minus (escaped, otherwise this means "range" in this context), occurring at least once and replace with nothing

Edit (see also comments below): This is using GNU sed. I need the backslash in front of the +. Note that on a Mac you very likely have BSD sed installed. Changing the backslash plus or to an asterisk will do the job as well.

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Andreas2.4k

sorry, it doesn't work on my debian 64-bit. Which system did you test on?

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by Arun2.3k

CenOS 6.1 using GNU sed.

Try again, the markup swallowed some backslashes earlier on.

ADD REPLYlink written 7.3 years ago by Andreas2.4k

sorry, doesn't work yet! :) spits me back the same line.

ADD REPLYlink written 7.3 years ago by Arun2.3k

The backslash before the + has to be removed, then it works fine: 's/:[0-9\.e\-]+//g'

ADD REPLYlink written 7.3 years ago by Joachim2.8k
0
gravatar for Arun
7.3 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

As you rightly mention, SED is greedy with regular expressions. The .* will replace the the longest occurring pattern with your replace string. Instead you should do it this way: Check for pattern that starts with : followed by any number of not : characters and then a ,.

> echo "tree gen.0 =  (28:1.000000000000000e-01,((33:1.000000000000000e-01,(((64:1.000000000000000e-01,54:1.000000000000000e-01):1.000000000000000e-01,35:1.000000000000000e-01):1.000000000000000e-01,(((61:1.000000000000000e-01,55:1.000000000000000e-01):1.000000000000000e-01,47:1.000000000000000e-01):1.000000000000000e-01,((31:1.000000000000000e-01,(77:1.000000000000000e-01,30:1.000000000000000e-01):1.000000000000000e-01)" > test.txt
> sed -e 's/:[^:]*,/,/g' test.txt
tree gen.0 =  (28,((33,(((64,54:1.000000000000000e-01),35:1.000000000000000e-01),(((61,55:1.000000000000000e-01),47:1.000000000000000e-01),((31,(77,30:1.000000000000000e-01):1.000000000000000e-01)
# there is also a pattern with )
> sed -e 's/:[^:]*,/,/g; s/:[^:]*)/)/g' test.txt
tree gen.0 =  (28,((33,(((64,54),35),(((61,55),47),((31,(77,30))
ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Arun2.3k
0
gravatar for Louis
7.3 years ago by
Louis50
Louis50 wrote:

Andreas, your script did not work on my Mac and I'm not certain why. Regardless, thank you for the reply.

Arun, your script did the trick. I also found an additional possibility (shown below) that does the same.

sed -e 's/:[^,)]*//g'

ADD COMMENTlink written 7.3 years ago by Louis50

oh my... how did I miss that! :)

ADD REPLYlink written 7.3 years ago by Arun2.3k

oh, your reply should normally be a comment to his or my answer, rather than a new answer.

ADD REPLYlink written 7.3 years ago by Arun2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour