Question

How to remove everything before a specific character in a column

0

Entering edit mode

24 months ago

blackadder ▴ 30

I have a big tsv file with 4 columns in the following format:

ERR435678 contig_1 /home/results/file1.txt /home/results/file1.txt
ERR435678 contig_2 /home/results/file2.txt /home/results/file2.txt
ERR435678 contig_3 /home/results/file3.txt /home/results/file3.txt

How can I manipulate only the elements of the third column in a way that I will get the following:

ERR435678 contig_1 file1.txt /home/results/file1.txt
ERR435678 contig_2 file2.txt /home/results/file2.txt
ERR435678 contig_3 file3.txt /home/results/file3.txt

unix bash awk sed • 1.3k views

ADD COMMENT • link updated 24 months ago by GenoMax 141k • written 24 months ago by blackadder ▴ 30

0

Entering edit mode

What have you tried? This is a really simple Google search that does not merit its own post.

ADD REPLY • link 24 months ago by Ram 43k

0

Entering edit mode

Hello again! I thought the same but I cannot find anything relevant for manipulating data in a specific column. I am looking in awk but I havent used it before and I cannot understand how to use it

ADD REPLY • link 24 months ago by blackadder ▴ 30

0

Entering edit mode

All of these column matching and replacing questions you have been asking are not bioinformatics.

If you know no other way of doing it, there is always Excel. The file can be imported, then copy the third column into a separate file, do search+replace, and paste it back to the original file. I estimate it would take less than a minute.

ADD REPLY • link 24 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Hello there! I am relatively new in the field and I am not very experienced with bash commands. What I am asking I know that I can do it in an excel sheet but I want to stop using excel and focus on bash. I believe that everyone that deals with bioinformatics has to do some kind of data manipulation in tsv or csv files. Thank you

ADD REPLY • link 24 months ago by blackadder ▴ 30

score 4 · Accepted Answer · 2022-04-27

4

Entering edit mode

24 months ago

cpad0112 21k

$ cat test.txt                       

ERR435678   contig_1    /home/results/file1.txt /home/results/file1.txt
ERR435678   contig_2    /home/results/file2.txt /home/results/file2.txt
ERR435678   contig_3    /home/results/file3.txt /home/results/file3.txt

$ awk '{sub(".*/","",$3)}1' test.txt

ERR435678 contig_1 file1.txt /home/results/file1.txt
ERR435678 contig_2 file2.txt /home/results/file2.txt
ERR435678 contig_3 file3.txt /home/results/file3.txt

ADD COMMENT • link 24 months ago by cpad0112 21k

0

Entering edit mode

TIL the 1 after the } is a hack for an always-true-so-just-print pattern. https://stackoverflow.com/a/24643330/1394178

ADD REPLY • link 24 months ago by Ram 43k

GenoMax · Accepted Answer · 2022-04-27

3

Entering edit mode

24 months ago

Pierre Lindenbaum 161k

I want to stop using excel and focus on bash.

You said the magic words.

sed 's%/home/results/%%' in.txt

ADD COMMENT • link updated 24 months ago by GenoMax 141k • written 24 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you. It worked !

ADD REPLY • link 24 months ago by blackadder ▴ 30

1

Entering edit mode

For each response here, understand how it works and what would break it. For example, this would not work if you wanted to retain the /home/results/ in col3 but remove them from col4, or if the /home/results part was variable. cpad0112's solution would address those scenarios. That solution can be enhanced by using awk -F "\t" -vOFS="\t" instead of just awk so the input and output field separators are made explicit.

Invest in learning sed and awk - you'll never use Excel (or any GUI tool) for text manipulation again.

ADD REPLY • link 24 months ago by Ram 43k

0

Entering edit mode

Thank you for the thorough explanation!

ADD REPLY • link 24 months ago by blackadder ▴ 30

0

Entering edit mode

Please accept these answers (green check mark) to provide closure to this thread. You can accept more than one answer as correct.

ADD REPLY • link 24 months ago by GenoMax 141k