I am currently working on a project to build a neural network which takes as input an amino acid sequence (protein fragment) with the fixed length of 34. I am trying to give a prediction whether or not the input sequence belongs to a certain class of repeat (TPR). Long story short:
My problem is to encode the sequence in order to have a proper input for the network. I thought about encoding each single amino acid with a vector of 20 bits (for 20 amino acids) having a '1' at the position in the vector representing the current amino acid and '0' for the other 19 bits. Concatenating these vectors leads me to a vector of length 20 * 34 which is quite big.
So does anybody here has any experience on how to represent an amino acid sequence to be able to provide it as input for a neural network.