Hi all, I found a lot of programs for predicting unstructured areas in proteins. I tried a few (DisEMBL, Disopred, MFDp) but I am unable to choose the one. Does anyone have suggestions regarding those prediction programs?
There are different programs availabe that often reach different results. The reason for this discrepancy is not only the mathematical approach but rather the definition of disorder. Some concepts treat small loops of 7 residues with high side chain mobility as disordered (e.g. because they are not properly resolved in an X-ray structure) while others consider this as a 'loop within a globular domain' and focus on long stretches without any secondary structure. All of these definitions have their merits, and which one is best depends on your planned use of the data.
My own interest in protein disorder is for the identification of short linear motifs that are often found in those regions. Among the ~10 different programs I evaluated, I got the best results from IUpred and Globplot (using the B-factor scale, not the default). When I did this analysis I was working for a company, so a few programs had to be excluded on license grounds. Again, which program works best for you depends on the underlying concept of disorder.
If you are really serious about this issue, I can give two recommendations:
Assemble a set of positive test
cases (proteins that are known to
have disordered regions by your
working definition) and negative test
cases (proteins that don't) and test
a number of different programs (and
different scales for proteins that
give you a choice). By comparing the
output, you get a good impression on
what the programs really score.
Run multiple predictors over the
sequence and then try to calculate a
consensus (e.g. by acception only
regions that are predicted as
disordered by at least 3 out of 5
predictors). When doing that, make
sure that you exclude programs/scales
that use the 'wrong' definition of