Question

bash command to process a line

0

Entering edit mode

12 months ago

saadleeshehreen ▴ 140

Hi,

I have a weird .txt file with this line.

lcl|CU459141.1_prot_CAM87240.1_2248 -          TniQ                 PF06527.14     0.018   13.6   0.0     0.024   13.2   0.0   1.1   1   0   0   1   1   1   0 [locus_tag=ABAYE2390] [db_xref=EnsemblGenomes-Gn:ABAYE2390

I need to process the line into 2 columns like following:

CU459141.1 CAM87240.1

Can anyone help me to write a bash command for this?

Thanks

linux command • 791 views

ADD COMMENT • link updated 12 months ago by Joe 21k • written 12 months ago by saadleeshehreen ▴ 140

0

Entering edit mode

First, this is not bioinformatics. Simple pattern matching and extraction.

Second, there isn't enough information in your message. Does each line start with lcl|? Are the words that need to be extracted always separated by _prot_. Can't expect help without making some effort on your own.

Third, what have you tried? You are asking for help in writing a command. If you haven't tried anything, the translation of your request is that you want someone to solve this for you.

ADD REPLY • link 12 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Yes the word start with lcl| and always sperated by _prot_. I am very naive pattern matching and extraction. I was trying to cut the field with cut -f1 command. But, I then realise the file is not a tab delimited. I do try following

sed 's/,/\t/g;s/\[//g;s/]//g' out.txt | cut -f1

ADD REPLY • link 12 months ago by saadleeshehreen ▴ 140

0

Entering edit mode

cut can use any delimiter. Change the delimiter to _ and you should be able to figure out the rest.

ADD REPLY • link 12 months ago by Joe 21k

score 1 · Answer 1 · 2023-03-28

1

Entering edit mode

12 months ago

GenoMax 141k

With just one line example not sure if this will work

awk -F '[|_]' '{print $2,$4}' your_file
CU459141.1 CAM87240.1