Question

manipulation of text by sed command

0

Entering edit mode

5.8 years ago

saadleeshehreen ▴ 140

Hi, I a file containing the genome ids following NZ_FLAT01000030.1_173 I need to manipulate those ids like this one: NZ_FLAT01000030.1

How can I do that by using sed command?

shell • 1.6k views

ADD COMMENT • link 5.8 years ago by saadleeshehreen ▴ 140

2

Entering edit mode

What have you tried? This is a good place to enhance your solve-by-google skills, and if you need help with that, I can walk you through it.

ADD REPLY • link 5.8 years ago by Ram 43k

0

Entering edit mode

I tried sed 's/_/\t/' . But the output is
NZ FLAT01000030.1_173 I need NZ_FLAT01000030.1

ADD REPLY • link updated 5.8 years ago by Ram 43k • written 5.8 years ago by saadleeshehreen ▴ 140

score 0 · Answer 1 · 2018-07-08

0

Entering edit mode

5.8 years ago

Ram 43k

Your command substitutes (s) an underscore _ with a tab \t. What you want is everything up to the last underscore scanning from the end to be replaced with nothing (which is the same as being removed).

So, you're looking to replace an underscore _ followed by any character until the end of the word .+\b. However, you must ensure the regex does not greedily pick everything starting from the first underscore. So, the second regex (.+\b) is better off ensuring the characters being matched are not underscores, which means the . can be replaced by a [^_] to make the regex [^_]+\b.

Combine those two, and you get _[^_]+\b. Use sed -r to replace that with nothing. Something like sed -r 's/<pattern>// or sed -r s/<pattern>//g should do it.

How to get to this by Google: Search for sed replace characters upto word boundary. Experiment with sed a bit and you'll get there.

ADD COMMENT • link 5.8 years ago by Ram 43k

0

Entering edit mode

Thanks. But, it couldn't edit my file as desired

sed -r 's/_//' output: NZFLAT01000030.1_173
sed -r 's/_//g' output: NZFLAT01000030.1173

ADD REPLY • link 5.8 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

This work for me

sed -E 's/_[0-9]+//g'

ADD REPLY • link 5.8 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

That is BSD sed. GNU sed has the -r option. If you're using a Mac, I highly recommend you switch to GNU coreutils so your scripts can run across linux distributions. Also, BSD sed sucks.

ADD REPLY • link 5.7 years ago by Ram 43k

0

Entering edit mode

You're missing the word boundary \b, which is why your g-modified sed is removing all _s. Your command is completely unlike the one I showed you, so I cannot help you.

ADD REPLY • link 5.8 years ago by Ram 43k