Is There A Single Letter Protein Alphabet That Encodes Modified Residues?
2
3
Entering edit mode
10.8 years ago

Hi,

I'm interested in protein modifications and was wondering if there is an already existing standard to express modified residues. Like instead of pS (Phospho-Serine) use the character { or something. Ideally it could cope with multiple modifications of the same residue, i.e. tri-methylation.

protein sequence annotation • 2.8k views
2
Entering edit mode

Must they be letters? # could be phospho-threonine, \$ = phospho-Ser and ^ = phospho-Tyr.

0
Entering edit mode

It's been a while since biochem, but I don't think there are enough letters for that. There are 20 AAs and only 26 letters in the alphabet.

0
Entering edit mode

Don't have to be letters, just characters. So yeah ^ & etc would work.

0
Entering edit mode

Hi Niallhaslam,

Thank you!

2
Entering edit mode
10.8 years ago

an already existing standard to express modified residues

There is an ontology for the modified amino acids:

2
Entering edit mode
10.8 years ago

It seems to me that the question could be "Is there a single letter symbol that encodes amino acid modifications?" as you could always align the modified protein to its "naked" sequence:

MALLIVSDFKvDGSTWP
......p....s.....


Some ideas for these single codes:
p = phosphorylated residue
m = methylated
c = carboxylated
a = acetylated
s = sumoylated
m = myristylated

and so on. This sort of referring to a reference is how a lot of human genome data will be organized, stored, displayed, especially as personal genomics grows.

0
Entering edit mode

I could see myself implementing this using letter_annotations in biopython. Thanks.

0
Entering edit mode

You're welcome. Remember, BioStar does well at what it is supposed to do by voting good responses/questions up and bad ones down. What you may need to consider with respect to the above is tissue or temporal specificity. Imagine looking at cell cycle regulators where the "p" for phosphorylation will be present in say G1 and not in S phase, or for a different protein only in liver and kidney but not in muscle.