Question: Protein sequence representation in neural network
0
gravatar for JDK_92
18 months ago by
JDK_920
JDK_920 wrote:

Hi,

I am currently working on a project to build a neural network which takes as input an amino acid sequence (protein fragment) with the fixed length of 34. I am trying to give a prediction whether or not the input sequence belongs to a certain class of repeat (TPR). Long story short:

My problem is to encode the sequence in order to have a proper input for the network. I thought about encoding each single amino acid with a vector of 20 bits (for 20 amino acids) having a '1' at the position in the vector representing the current amino acid and '0' for the other 19 bits. Concatenating these vectors leads me to a vector of length 20 * 34 which is quite big.

So does anybody here has any experience on how to represent an amino acid sequence to be able to provide it as input for a neural network.

Thank you!

ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 18 months ago by JDK_920

Your one-hot encoding is commonly used, but you could also try to use physical/chemical properties (look up AAINDEX) to represent the amino acids.

ADD REPLYlink written 18 months ago by cschu1811.9k

Thank you. I'll take some properties from AAINDEX along with the one-hot encoding and see what the results will be.

ADD REPLYlink written 18 months ago by JDK_920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour