Entering edit mode
10 weeks ago
saadleeshehreen ▴ 110
I have a weird .txt file with this line.
lcl|CU459141.1_prot_CAM87240.1_2248 - TniQ PF06527.14 0.018 13.6 0.0 0.024 13.2 0.0 1.1 1 0 0 1 1 1 0 [locus_tag=ABAYE2390] [db_xref=EnsemblGenomes-Gn:ABAYE2390
I need to process the line into 2 columns like following:
Can anyone help me to write a bash command for this?
First, this is not bioinformatics. Simple pattern matching and extraction.
Second, there isn't enough information in your message. Does each line start with
lcl|? Are the words that need to be extracted always separated by
_prot_. Can't expect help without making some effort on your own.
Third, what have you tried? You are asking for help in writing a command. If you haven't tried anything, the translation of your request is that you want someone to solve this for you.
Yes the word start with lcl| and always sperated by _prot_. I am very naive pattern matching and extraction. I was trying to cut the field with cut -f1 command. But, I then realise the file is not a tab delimited. I do try following
cutcan use any delimiter. Change the delimiter to
_and you should be able to figure out the rest.