Entering edit mode
8.3 years ago
astrozheng
▴
10
I have a set of data containing over 3K fasta sequences of a type of protein domains. In some sequences there might be 2 or 3 tandem the same type domains among which the sequences are not identical. I am wonder if there is a way to split all the sequences with more than one domain based on the long asta sequence which contains multiple domains?
All how could I obtain all the diverse single domain sequences from the long sequences and in the same time withou redundant sequences of the single domains?