bash command to process a line
1
0
Entering edit mode
12 months ago

Hi,

I have a weird .txt file with this line.

lcl|CU459141.1_prot_CAM87240.1_2248 -          TniQ                 PF06527.14     0.018   13.6   0.0     0.024   13.2   0.0   1.1   1   0   0   1   1   1   0 [locus_tag=ABAYE2390] [db_xref=EnsemblGenomes-Gn:ABAYE2390

I need to process the line into 2 columns like following:

CU459141.1 CAM87240.1

Can anyone help me to write a bash command for this?

Thanks

linux command • 791 views
ADD COMMENT
0
Entering edit mode

First, this is not bioinformatics. Simple pattern matching and extraction.

Second, there isn't enough information in your message. Does each line start with lcl|? Are the words that need to be extracted always separated by _prot_. Can't expect help without making some effort on your own.

Third, what have you tried? You are asking for help in writing a command. If you haven't tried anything, the translation of your request is that you want someone to solve this for you.

ADD REPLY
0
Entering edit mode

Yes the word start with lcl| and always sperated by _prot_. I am very naive pattern matching and extraction. I was trying to cut the field with cut -f1 command. But, I then realise the file is not a tab delimited. I do try following

sed 's/,/\t/g;s/\[//g;s/]//g' out.txt | cut -f1
ADD REPLY
0
Entering edit mode

cut can use any delimiter. Change the delimiter to _ and you should be able to figure out the rest.

ADD REPLY
1
Entering edit mode
12 months ago
GenoMax 141k

With just one line example not sure if this will work

awk -F '[|_]' '{print $2,$4}' your_file
CU459141.1 CAM87240.1
ADD COMMENT
0
Entering edit mode

Thanks a lot. It worked

ADD REPLY

Login before adding your answer.

Traffic: 1489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6