Hi, I a file containing the genome ids following NZ_FLAT01000030.1_173 I need to manipulate those ids like this one: NZ_FLAT01000030.1
How can I do that by using sed command?
Hi, I a file containing the genome ids following NZ_FLAT01000030.1_173 I need to manipulate those ids like this one: NZ_FLAT01000030.1
How can I do that by using sed command?
Your command substitutes (s
) an underscore _
with a tab \t
. What you want is everything up to the last underscore scanning from the end to be replaced with nothing (which is the same as being removed).
So, you're looking to replace an underscore _
followed by any character until the end of the word .+\b
. However, you must ensure the regex does not greedily pick everything starting from the first underscore. So, the second regex (.+\b
) is better off ensuring the characters being matched are not underscores, which means the .
can be replaced by a [^_]
to make the regex [^_]+\b
.
Combine those two, and you get _[^_]+\b
. Use sed -r
to replace that with nothing. Something like sed -r 's/<pattern>//
or sed -r s/<pattern>//g
should do it.
How to get to this by Google: Search for sed replace characters upto word boundary
. Experiment with sed a bit and you'll get there.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What have you tried? This is a good place to enhance your solve-by-google skills, and if you need help with that, I can walk you through it.
I tried
sed 's/_/\t/'
. But the output isNZ FLAT01000030.1_173
I needNZ_FLAT01000030.1