Question: Replace two or more space by a tab with terminal
gravatar for julien-le.roy
21 months ago by
julien-le.roy0 wrote:

Hi all, I get an issue when I want to replace multiple space by a tab in a file (in other word convert the file in a table with separator to easily import in excel). Here is the file (Basically, there are 5 columns separated by two or more spaces):

lcl|tetur19g00650  length:863 (mRNA) (E75)  (Ecdysone-induced pro...   132    2e-32
lcl|tetur03g08440  length:544 (mRNA) (HR3)  (Hormone Receptor 3)      85.1    5e-18
lcl|tetur04g01460  length:396 (mRNA) (SVP)  (Seven Up)                66.6    1e-12
lcl|tetur07g00140  length:1063 (mRNA) (HR4)  (Hormone Receptor 4)     63.5    1e-11
lcl|tetur10g04690  length:854 (mRNA) (HR38 (1))  (Hormone Recepto...  61.6    5e-11
lcl|tetur08g06490  length:645 (mRNA) (FTZ-F1)  (Fushi tarazu - Fa...  61.2    6e-11
lcl|tetur01g11050  length:726 (mRNA) (n/a)  (Zinc finger, nuclear...  60.8    9e-11
lcl|tetur01g11040  length:726 (mRNA) (NR2E3)  (Zinc finger, nucle...  60.8    9e-11
lcl|tetur01g09240  length:431 (mRNA) (RXR (2))  (Retinoid X Recep...  60.1    1e-10
lcl|tetur10g04710  length:867 (mRNA) (HR38 (2))  (Hormone Recepto...  59.7    2e-10
lcl|tetur34g00430  length:561 (mRNA) (kni)  (Hypothetical knrl) (...  57.4    1e-09
lcl|tetur03g02550  length:497 (mRNA) (PNR-like)  (Photocell recep...  57.0    1e-09
lcl|tetur31g01930  length:497 (mRNA) (RXR (1))  (Retinoid X Recep...  56.6    1e-09
lcl|tetur05g04280  length:499 (mRNA) (HNF4)  (Hepatocyte Nuclear ...  56.6    2e-09
lcl|tetur01g07700  length:338 (mRNA) (HR83-like)  (Possibly HR83-...  52.0    4e-08
lcl|tetur01g02690  length:576 (mRNA) (dissatisfaction)  (dissatis...  51.6    6e-08
lcl|tetur08g01210  length:249 (mRNA) (Tll)  (Tailless)                51.2    7e-08
lcl|tetur11g04570  length:901 (mRNA) (HR39)  (Hormone receptor-li...  47.4    9e-07
lcl|tetur28g00490  length:370 (mRNA) (ERR)  (Estrogen-related Rec...  47.0    1e-06
lcl|tetur11g01960  length:567 (mRNA) (HR96-like g)  (HR96-like nu...  45.8    2e-06
lcl|tetur01g15140  length:430 (mRNA) (EcR)  (Ecdysone Receptor)       44.7    6e-06
lcl|tetur01g07820  length:564 (mRNA) (HR96-like d)  (HR96-like nu...  42.7    2e-05
lcl|tetur34g00750  length:579 (mRNA) (HR96-like a)  (HR96-like nu...  42.7    3e-05
lcl|tetur30g01210  length:669 (mRNA) (HR96-like b)  (HR96-like nu...  39.3    2e-04
lcl|tetur36g00260  length:501 (mRNA) (HR96-like h)  (HR96-like nu...  39.3    3e-04
lcl|tetur20g01820  length:499 (mRNA) (HR96-like e)  (HR96-like nu...  38.9    3e-04
lcl|tetur04g03100  length:490 (mRNA) (HR96-like f)  (HR96-like nu...  38.9    3e-04
lcl|tetur17g03630  length:483 (mRNA) (HR96-like c)  (HR96-like nu...  38.5    4e-04
lcl|tetur07g04810  length:311 (mRNA) (E78)  (Ecdysone-induced pro...  33.9    0.012

I use this command line but nothing happens (I got a new file but with the same space-separator):

sed 's/ \+ /\t/g' inputfile > outputfile

Do you have some idea? Thank you very much!

terminal unix command • 486 views
ADD COMMENTlink modified 21 months ago by WouterDeCoster43k • written 21 months ago by julien-le.roy0

I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 21 months ago by WouterDeCoster43k

Thanks! It's much better!

ADD REPLYlink written 21 months ago by julien-le.roy0

for better readability , use sed 's/\s\s\+/\t/g' input (for two or more spaces). However in one of the columns, i see text being separated by spaces. Make sure that you have uniform space between columns, not within column.

ADD REPLYlink modified 20 months ago • written 20 months ago by cpad011212k

Thank you for your reply, actually when there are 2 or more space I would like to replace them by a tab. But when there is only 1 space let it like that. I hope I'm clear

ADD REPLYlink written 20 months ago by julien-le.roy0
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe55k
Kevin Blighe55k wrote:

You're running that sed command incorrectly for what you want to do. It should just be:

sed 's/ \+/\t/g' inputfile > outputfile
ADD COMMENTlink written 21 months ago by Kevin Blighe55k

Thank you, It still not working I don't know why... I'm working on mac but i'm not sure this is an issue?

ADD REPLYlink written 20 months ago by julien-le.roy0

On Mac, it would just be this:

sed -E $'s/[[:blank:]]+/\t/g' inputfile

However, I see your issue. Even the spaces in the gene descriptions will change.

Are you sure that using a rule whereby only 2 or more spaces combined are changed is valid?

ADD REPLYlink written 20 months ago by Kevin Blighe55k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1043 users visited in the last hour