Question: Replace between character and white space using unix sed
0
gravatar for dllopezr
6 months ago by
dllopezr40
dllopezr40 wrote:

Hi everyone

I have a file like this

>NC_003037.1:453555-454448 Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:2279220-2278345 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:1763831-1762950 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:844089-844916 Methanococcus maripaludis strain S2, complete sequence

that I replace the first two sections to obtain this:

>NC_003037.1:ChrStart-454448 Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:ChrStart-2278345 Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:ChrStart-1762950 Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:ChrStart-844916 Methanococcus maripaludis strain S2, complete sequence

I've tried to replace the numbers between the "-" and the space before the species name with the word "ChrStop". I've tried sed with [[:blank:]] [[:space:]] and /s options in this way:

sed -i 's/-.*[[:blank:]]/-ChrStop[[:blank:]]/g' filetxt

But always the command replace beyond I want, for example:

>NC_003037.1:ChrStart-ChrStop[[:blank:]]sequence
>NC_007493.2:ChrStart-ChrStop[[:blank:]]sequence
>NC_007952.1:ChrStart-ChrStop[[:blank:]]sequence
>NC_005791.1:ChrStart-ChrStop[[:blank:]]sequence

Can you help me with the correct way to match the space and replace between this character and "-"?

thank you so much.

unix sed replace • 263 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe41k • written 6 months ago by dllopezr40
2
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe41k
Kevin Blighe41k wrote:

Try this (?):

sed 's/ChrStart\-[0-9]*[[:blank:]]/ChrStart\-ChrStop /g' test
>NC_003037.1:ChrStart-ChrStop Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
>NC_007493.2:ChrStart-ChrStop Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence
>NC_007952.1:ChrStart-ChrStop Paraburkholderia xenovorans LB400 chromosome 2, complete sequence
>NC_005791.1:ChrStart-ChrStop Methanococcus maripaludis strain S2, complete sequence
ADD COMMENTlink written 6 months ago by Kevin Blighe41k

Hi @Kevin, thank you for your help!

If you don't mind, could you explain to me how this command works, especially this part ChrStart\-[0-9]*[[:blank:]]?

What I really want to do is to pass this numbers to different variables, say "ChrStart" = $1 and ChrStop = $2 to pass to another command. The use of "ChrStart" in the above code will spoil this objective?

ADD REPLYlink written 6 months ago by dllopezr40
1

ChrStart is taken literally
\- is taken as a hyphen (-). The backslash escapes its metacharacter behavior (not required here as - is a metacharacter only within character classes but better safe than sorry.
[0-9]* matches any length of numbers between 0 and 9
[[:blank:]] matches a blank space

ChrStart-100000 is thus broken into 4 matches like so:

|ChrStart|-|100000| |
|----1---|2|--3---|4|
ADD REPLYlink modified 6 months ago • written 6 months ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1072 users visited in the last hour