I have a multifasta file with a list of several orthologs from a phylum of bacteria. They have in common a big central domain and they differ in their N and C-terminal extensions, in which they don't have any recognizable motif or domain. I'd like to find a way to extract these extensions or tails in a fast way (I wouldn't like to look at the sequences one by one and cut these subsequences) in order to align them. I imagine I need the coordinates of the core central domain, but I'm not sure how to retrieve them for each gene sequence from a multifasta file.
Thanks in advance :)