Hi, does anyone know about a database that relates sequences of aminoacids to motifs of folds?
By this I mean maybe the last fifteen aminoacids of a protein have been observed to fold in a somewhat conserved way. I'm not looking for the entire protein fold but subsequences that might match a known common motif of folds. I have already tried several databases. The one that is closer is the Protein Folding database but it doesn't work well
The "Conserved Domains" database at NCBI. Maybe you didn't notice, but when you do a Blast search using NCBI-Blast, the page automatically returns you a representation of the conserved domains. This is the database from which these predictions are from.
Prosite from Expasy is probably the most popular choice. Prosite represents protein domains as patterns or matrices - for example, a domain can be represented as a regular expression like A-T-H-[D or E].
PFAM uses another method to identify sequences, based on Hidden Markov Models (see also HMMER for a more detailed description of the method). It should be more accurate than prosite, but also identify less matches.