I have around 1,000 transcript sequences without any protein domain annotations (no hits in pfam, smart, panther...) and I want to see if there are any enriched amino acids motifs.
I was thinking maybe:
1) perform an all vs all tblastx
2) gather all non-overlapping HSPs excluding alignments to self
3) Extract HSP sequence from alignment.
4) Blast each extracted HSP sequence to the transcripts
4) Each transcript that hits the HSP sequence will be counted as containing the protein motif.
This seems a bit over-complicated to me. Are there any software or packages that already does this?