I wrote a forward-backward algorithm for determining protein secondary structure (three hidden states: alpha helix, beta sheet, coil; and 20 observed states: the 20 amino acids.
I would like to test my algorithm on a set of proteins, for which the secondary structure is known. I have not done this before, but assume it involves the amino acid sequence, where each amino acid is in line with one of the three hidden states.
What is the ideal method for me to obtain a set of such sequences? Thank you!