I have different sets of sequences from different sources (e.g. I have around 20 fasta files (each fasta file correspond to one source) where each fasta file contains around 1000 sequences).
I'm interested in identifying sequences that are similar and appear in more than one fasta file. In other words, I might find that sequence A happens to appear in all 20 fasta files, sequence B happens to appear in only 10 fasta files, sequence C happens to appear in a 2 fasta files.
Are there any tools that could do this? If not, any ideas how to tackle this problem in an efficient way?