Hi, I am trying to extract organisms' names from the headers in a multi-fasta file named input.fa shown below:
>KZR5864_Org_name_nam_strain.11 GHTKKLACWQRTTAAFFGYYWOPPEEDSSSSLKKDDIIPFTQWENMAATGGFDMLLAAPP >OIA4716.3_Org_other_name_bla_bla AHHTTIPLNCCWWETRQKLLSSNNNMTIPAHGFSSLLKANCDSM >SMAR_08120_Other_org_name_bla AGTHHKKLAMNCWTQEREYPPILLSSDFMNCCVTTQQLAK
what I want is to obtain is the organism name in the header. I have tried the following sed command but I am unable to check for the alphanumerics, therefore, I am also getting the digits after the first underscore like in third header.
sed -eT -e 's|_|&\n|;D' input.fa > out.txt
Org_name_nam_strain.11 Org_other_name_bla_bla Other_org_name_bla
Please tell me how to obtain org names only. Thanks!