Entering edit mode
6.2 years ago
Eman.Ismaiel
•
0
I am trying to convert protein sequence to equal fingerprint(binary vector) MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQP to 00011100000 for example How can I do that I am using PFAM DB??
There is no obvious connection between the sequence your binary vector and PFAM. What do you need that "fingerprint" for?
I need it for learning algorithm, if so, How can I get from protein seq --> protein domain
Maybe you should clarify first you really want to do and which file formats you wish to use? What do you want the algorithm to "learn"?
I need that for prediction compound-protein interactions so, I need to encode protein sequence into fingerprint I am using FASTA file to get protein sequences from PFAM DB
You seem to revolve around the same concepts all the time, it is not getting much clearer to me. Did you have any doubts if that fingerprint is the right way to go? To resolve this, the only way is to tell us exactly what you want to achieve, which compound, which protein, which kind of interaction, etc. and include a real example
I am sorry for that there is no connection between sequence and binary number that I wrote it's just an example I need to convert protein sequences to its corresponding domains How can I do that?
At least you cannot encode this as binary unless you are dealing with only a single domain or something like "known domain" = 1 or "no known domain" = 0. I am not going to explain to you this kind of conversion because any method built on that is going to work very poorly. If you want docking of ligand-protein, use a docking server instead. Binding prediction on linear sequence works only for very specialized cases, example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3380730/