manipulation of text by sed command
1
0
Entering edit mode
3.9 years ago

Hi, I a file containing the genome ids following NZ_FLAT01000030.1_173 I need to manipulate those ids like this one: NZ_FLAT01000030.1

How can I do that by using sed command?

shell • 1.2k views
ADD COMMENT
2
Entering edit mode

What have you tried? This is a good place to enhance your solve-by-google skills, and if you need help with that, I can walk you through it.

ADD REPLY
0
Entering edit mode

I tried sed 's/_/\t/' . But the output is
NZ FLAT01000030.1_173 I need NZ_FLAT01000030.1

ADD REPLY
0
Entering edit mode
3.9 years ago
Ram 36k

Your command substitutes (s) an underscore _ with a tab \t. What you want is everything up to the last underscore scanning from the end to be replaced with nothing (which is the same as being removed).

So, you're looking to replace an underscore _ followed by any character until the end of the word .+\b. However, you must ensure the regex does not greedily pick everything starting from the first underscore. So, the second regex (.+\b) is better off ensuring the characters being matched are not underscores, which means the . can be replaced by a [^_] to make the regex [^_]+\b.

Combine those two, and you get _[^_]+\b. Use sed -r to replace that with nothing. Something like sed -r 's/<pattern>// or sed -r s/<pattern>//g should do it.

How to get to this by Google: Search for sed replace characters upto word boundary. Experiment with sed a bit and you'll get there.

ADD COMMENT
0
Entering edit mode

Thanks. But, it couldn't edit my file as desired

sed -r 's/_//' output: NZFLAT01000030.1_173
sed -r 's/_//g' output: NZFLAT01000030.1173
ADD REPLY
0
Entering edit mode

This work for me

sed -E 's/_[0-9]+//g'
ADD REPLY
0
Entering edit mode

That is BSD sed. GNU sed has the -r option. If you're using a Mac, I highly recommend you switch to GNU coreutils so your scripts can run across linux distributions. Also, BSD sed sucks.

ADD REPLY
0
Entering edit mode

You're missing the word boundary \b, which is why your g-modified sed is removing all _s. Your command is completely unlike the one I showed you, so I cannot help you.

ADD REPLY

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6