So, I have this header lines...
>CP001830.1_cds_AEH77465.1_1 [locus_tag=SM11_chr0180] [protein_id=AEH77465.1] [location=195246..195674] >KI271598.1_cds_ERL64443.1_1 [locus_tag=L248_0985] [protein_id=ERL64443.1] [location=complement(53545..53919)] >CR931997.1_cds_CAI37700.1_1 [locus_tag=jk1527] [db_xref=EnsemblGenomes-Gn:jk1527,EnsemblGenomes-Tr:CAI37700,GOA:Q4JU07,InterPro:IPR001185,UniProtKB/TrEMBL:Q4JU07] [protein_id=CAI37700.1] [location=1801511..1801945] >HE858529.1_cds_CCI62285.1_1 [locus_tag=SDSE_0788] [db_xref=EnsemblGenomes-Gn:SDSE_0788,EnsemblGenomes-Tr:CCI62285,GOA:K4Q7R5,InterPro:IPR001185,InterPro:IPR019823,UniProtKB/TrEMBL:K4Q7R5] [protein_id=CCI62285.1] [location=complement(732360..732734)]
In some lines I have the information "[db_xref=Ensemb...]" , which I want to remove it.
I can not remove everything after this information (e.g. using "sed"), because I need the remaining the line. I tried to used awk or sed. Also, I can not "cut" or print [awk] according to the column because they are not in all lines.
So, it should be better a script using a regular expression - I guess.
However, I cannot figure out... Could you please help?