How to remove everything before a specific character in a column
2
0
Entering edit mode
7 months ago
blackadder • 0

I have a big tsv file with 4 columns in the following format:

ERR435678 contig_1 /home/results/file1.txt /home/results/file1.txt
ERR435678 contig_2 /home/results/file2.txt /home/results/file2.txt
ERR435678 contig_3 /home/results/file3.txt /home/results/file3.txt

How can I manipulate only the elements of the third column in a way that I will get the following:

ERR435678 contig_1 file1.txt /home/results/file1.txt
ERR435678 contig_2 file2.txt /home/results/file2.txt
ERR435678 contig_3 file3.txt /home/results/file3.txt
unix bash awk sed • 534 views
ADD COMMENT
0
Entering edit mode

What have you tried? This is a really simple Google search that does not merit its own post.

ADD REPLY
0
Entering edit mode

Hello again! I thought the same but I cannot find anything relevant for manipulating data in a specific column. I am looking in awk but I havent used it before and I cannot understand how to use it

ADD REPLY
0
Entering edit mode

All of these column matching and replacing questions you have been asking are not bioinformatics.

If you know no other way of doing it, there is always Excel. The file can be imported, then copy the third column into a separate file, do search+replace, and paste it back to the original file. I estimate it would take less than a minute.

ADD REPLY
0
Entering edit mode

Hello there! I am relatively new in the field and I am not very experienced with bash commands. What I am asking I know that I can do it in an excel sheet but I want to stop using excel and focus on bash. I believe that everyone that deals with bioinformatics has to do some kind of data manipulation in tsv or csv files. Thank you

ADD REPLY
4
Entering edit mode
7 months ago
$ cat test.txt                       

ERR435678   contig_1    /home/results/file1.txt /home/results/file1.txt
ERR435678   contig_2    /home/results/file2.txt /home/results/file2.txt
ERR435678   contig_3    /home/results/file3.txt /home/results/file3.txt

$ awk '{sub(".*/","",$3)}1' test.txt

ERR435678 contig_1 file1.txt /home/results/file1.txt
ERR435678 contig_2 file2.txt /home/results/file2.txt
ERR435678 contig_3 file3.txt /home/results/file3.txt
ADD COMMENT
0
Entering edit mode

TIL the 1 after the } is a hack for an always-true-so-just-print pattern. https://stackoverflow.com/a/24643330/1394178

ADD REPLY
3
Entering edit mode
7 months ago

I want to stop using excel and focus on bash.

You said the magic words.

sed 's%/home/results/%%' in.txt 
ADD COMMENT
0
Entering edit mode

Thank you. It worked !

ADD REPLY
1
Entering edit mode

For each response here, understand how it works and what would break it. For example, this would not work if you wanted to retain the /home/results/ in col3 but remove them from col4, or if the /home/results part was variable. cpad0112's solution would address those scenarios. That solution can be enhanced by using awk -F "\t" -vOFS="\t" instead of just awk so the input and output field separators are made explicit.

Invest in learning sed and awk - you'll never use Excel (or any GUI tool) for text manipulation again.

ADD REPLY
0
Entering edit mode

Thank you for the thorough explanation!

ADD REPLY
0
Entering edit mode

Please accept these answers (green check mark) to provide closure to this thread. You can accept more than one answer as correct.

ADD REPLY

Login before adding your answer.

Traffic: 1345 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6