extracting last word in [ ]
1
0
Entering edit mode
5.4 years ago

I have a file with description of transcript with organism:

XP 020275132.1 nodal modulator 3 [Asparagus officinalis]
XP 008781144.1 nodal modulator 1 [Phoenix dactylifera]

I want to extract the organism name wich is present at the last in [] brackets.

sed grep linux terminal • 1.0k views
ADD COMMENT
0
Entering edit mode

output:

$ grep -Po '(?<= \[).*(?=])' test.txt
Vitis vinifera
Musa acuminata subsp. malaccensis

$ sed 's/.*\s\[\(.*\)\]$/\1/g' test.txt
Vitis vinifera
Musa acuminata subsp. malaccensis

input:

 $ cat test.txt 
XP 003635378.1 PREDICTED: stearoyl-[acyl-carrier-protein] 9-desaturase, chloroplastic [Vitis vinifera]
XP 009411852.1 PREDICTED: stearoyl-[acyl-carrier-protein] 9-desaturase, chloroplastic-like isoform X2 [Musa acuminata subsp. malaccensis]
ADD REPLY
3
Entering edit mode
5.4 years ago
ATpoint 85k

rev your.file | awk -F "[" '{gsub("]", "");print $1 | "rev"}'

I highly recommend spending time to learn these file manipulation strategies yourself, as you will encounter this frequently in your career.

ADD COMMENT
0
Entering edit mode

Thankyou ATpoint ,

but my file is having these kind of entries:

XP 003635378.1 PREDICTED: stearoyl-[acyl-carrier-protein] 9-desaturase, chloroplastic [Vitis vinifera]
XP 009411852.1 PREDICTED: stearoyl-[acyl-carrier-protein] 9-desaturase, chloroplastic-like isoform X2 [Musa acuminata subsp. malaccensis]

so along with organism it is taking protein name also.

ADD REPLY
0
Entering edit mode

edited my answer

ADD REPLY
0
Entering edit mode

Thank you so much, it worked.

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6