Question: Replace two or more space by a tab with terminal
0
gravatar for julien-le.roy
2.3 years ago by
julien-le.roy0 wrote:

Hi all, I get an issue when I want to replace multiple space by a tab in a file (in other word convert the file in a table with separator to easily import in excel). Here is the file (Basically, there are 5 columns separated by two or more spaces):

lcl|tetur19g00650  length:863 (mRNA) (E75)  (Ecdysone-induced pro...   132    2e-32
lcl|tetur03g08440  length:544 (mRNA) (HR3)  (Hormone Receptor 3)      85.1    5e-18
lcl|tetur04g01460  length:396 (mRNA) (SVP)  (Seven Up)                66.6    1e-12
lcl|tetur07g00140  length:1063 (mRNA) (HR4)  (Hormone Receptor 4)     63.5    1e-11
lcl|tetur10g04690  length:854 (mRNA) (HR38 (1))  (Hormone Recepto...  61.6    5e-11
lcl|tetur08g06490  length:645 (mRNA) (FTZ-F1)  (Fushi tarazu - Fa...  61.2    6e-11
lcl|tetur01g11050  length:726 (mRNA) (n/a)  (Zinc finger, nuclear...  60.8    9e-11
lcl|tetur01g11040  length:726 (mRNA) (NR2E3)  (Zinc finger, nucle...  60.8    9e-11
lcl|tetur01g09240  length:431 (mRNA) (RXR (2))  (Retinoid X Recep...  60.1    1e-10
lcl|tetur10g04710  length:867 (mRNA) (HR38 (2))  (Hormone Recepto...  59.7    2e-10
lcl|tetur34g00430  length:561 (mRNA) (kni)  (Hypothetical knrl) (...  57.4    1e-09
lcl|tetur03g02550  length:497 (mRNA) (PNR-like)  (Photocell recep...  57.0    1e-09
lcl|tetur31g01930  length:497 (mRNA) (RXR (1))  (Retinoid X Recep...  56.6    1e-09
lcl|tetur05g04280  length:499 (mRNA) (HNF4)  (Hepatocyte Nuclear ...  56.6    2e-09
lcl|tetur01g07700  length:338 (mRNA) (HR83-like)  (Possibly HR83-...  52.0    4e-08
lcl|tetur01g02690  length:576 (mRNA) (dissatisfaction)  (dissatis...  51.6    6e-08
lcl|tetur08g01210  length:249 (mRNA) (Tll)  (Tailless)                51.2    7e-08
lcl|tetur11g04570  length:901 (mRNA) (HR39)  (Hormone receptor-li...  47.4    9e-07
lcl|tetur28g00490  length:370 (mRNA) (ERR)  (Estrogen-related Rec...  47.0    1e-06
lcl|tetur11g01960  length:567 (mRNA) (HR96-like g)  (HR96-like nu...  45.8    2e-06
lcl|tetur01g15140  length:430 (mRNA) (EcR)  (Ecdysone Receptor)       44.7    6e-06
lcl|tetur01g07820  length:564 (mRNA) (HR96-like d)  (HR96-like nu...  42.7    2e-05
lcl|tetur34g00750  length:579 (mRNA) (HR96-like a)  (HR96-like nu...  42.7    3e-05
lcl|tetur30g01210  length:669 (mRNA) (HR96-like b)  (HR96-like nu...  39.3    2e-04
lcl|tetur36g00260  length:501 (mRNA) (HR96-like h)  (HR96-like nu...  39.3    3e-04
lcl|tetur20g01820  length:499 (mRNA) (HR96-like e)  (HR96-like nu...  38.9    3e-04
lcl|tetur04g03100  length:490 (mRNA) (HR96-like f)  (HR96-like nu...  38.9    3e-04
lcl|tetur17g03630  length:483 (mRNA) (HR96-like c)  (HR96-like nu...  38.5    4e-04
lcl|tetur07g04810  length:311 (mRNA) (E78)  (Ecdysone-induced pro...  33.9    0.012

I use this command line but nothing happens (I got a new file but with the same space-separator):

sed 's/ \+ /\t/g' inputfile > outputfile

Do you have some idea? Thank you very much!

terminal unix command • 569 views
ADD COMMENTlink modified 2.3 years ago by WouterDeCoster44k • written 2.3 years ago by julien-le.roy0

I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 2.3 years ago by WouterDeCoster44k

Thanks! It's much better!

ADD REPLYlink written 2.3 years ago by julien-le.roy0

for better readability , use sed 's/\s\s\+/\t/g' input (for two or more spaces). However in one of the columns, i see text being separated by spaces. Make sure that you have uniform space between columns, not within column.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by cpad011214k

Thank you for your reply, actually when there are 2 or more space I would like to replace them by a tab. But when there is only 1 space let it like that. I hope I'm clear

ADD REPLYlink written 2.3 years ago by julien-le.roy0
0
gravatar for Kevin Blighe
2.3 years ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

You're running that sed command incorrectly for what you want to do. It should just be:

sed 's/ \+/\t/g' inputfile > outputfile
ADD COMMENTlink written 2.3 years ago by Kevin Blighe65k

Thank you, It still not working I don't know why... I'm working on mac but i'm not sure this is an issue?

ADD REPLYlink written 2.3 years ago by julien-le.roy0

On Mac, it would just be this:

sed -E $'s/[[:blank:]]+/\t/g' inputfile

However, I see your issue. Even the spaces in the gene descriptions will change.

Are you sure that using a rule whereby only 2 or more spaces combined are changed is valid?

ADD REPLYlink written 2.3 years ago by Kevin Blighe65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1574 users visited in the last hour