Question: Protein Secondary Structure
I wrote a forward-backward algorithm for determining protein secondary structure (three hidden states: alpha helix, beta sheet, coil; and 20 observed states: the 20 amino acids.

I would like to test my algorithm on a set of proteins, for which the secondary structure is known. I have not done this before, but assume it involves the amino acid sequence, where each amino acid is in line with one of the three hidden states.

What is the ideal method for me to obtain a set of such sequences? Thank you!

Not sure it's ideal but I'd use an ASTRAL SCOP subset of all PDB sequences, get their secondary structure via DSSP (or maybe they're available through ASTRAL SCOP somewhere) and then hammer away with _k_-fold cross-validation

