Replace between character and white space using unix sed
1
0
Entering edit mode
3.1 years ago
dllopezr ▴ 80

Hi everyone

I have a file like this

>NC_003037.1:453555-454448 Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:2279220-2278345 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:1763831-1762950 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:844089-844916 Methanococcus maripaludis strain S2, complete sequence

that I replace the first two sections to obtain this:

>NC_003037.1:ChrStart-454448 Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:ChrStart-2278345 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:ChrStart-1762950 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:ChrStart-844916 Methanococcus maripaludis strain S2, complete sequence

I've tried to replace the numbers between the "-" and the space before the species name with the word "ChrStop". I've tried sed with [[:blank:]] [[:space:]] and /s options in this way:

sed -i 's/-.*[[:blank:]]/-ChrStop[[:blank:]]/g' filetxt

But always the command replace beyond I want, for example:

>NC_003037.1:ChrStart-ChrStop[[:blank:]]sequence
>NC_007493.2:ChrStart-ChrStop[[:blank:]]sequence
>NC_007952.1:ChrStart-ChrStop[[:blank:]]sequence
>NC_005791.1:ChrStart-ChrStop[[:blank:]]sequence

Can you help me with the correct way to match the space and replace between this character and "-"?

thank you so much.

replace sed unix • 703 views
ADD COMMENT
2
Entering edit mode
3.1 years ago

Try this (?):

sed 's/ChrStart\-[0-9]*[[:blank:]]/ChrStart\-ChrStop /g' test
>NC_003037.1:ChrStart-ChrStop Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:ChrStart-ChrStop Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:ChrStart-ChrStop Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:ChrStart-ChrStop Methanococcus maripaludis strain S2, complete sequence
ADD COMMENT
0
Entering edit mode

Hi @Kevin, thank you for your help!

If you don't mind, could you explain to me how this command works, especially this part ChrStart\-[0-9]*[[:blank:]]?

What I really want to do is to pass this numbers to different variables, say "ChrStart" = $1 and ChrStop = $2 to pass to another command. The use of "ChrStart" in the above code will spoil this objective?

ADD REPLY
1
Entering edit mode

ChrStart is taken literally
\- is taken as a hyphen (-). The backslash escapes its metacharacter behavior (not required here as - is a metacharacter only within character classes but better safe than sorry.
[0-9]* matches any length of numbers between 0 and 9
[[:blank:]] matches a blank space

ChrStart-100000 is thus broken into 4 matches like so:

|ChrStart|-|100000| |
|----1---|2|--3---|4|
ADD REPLY

Login before adding your answer.

Traffic: 2368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6