X amino acid in ensembl
28 days ago
Jason • 0

Hello all,

I am working on aligning proteins orthologs from different species. I am using the Ensembl API. Strangely, some protein sequences from non-human species have a lot of X. I wonder what does that mean? In theory, if their genome sequence is know, the protein sequence should be known, right? How do I score these X when I calculate the conservation scores? Thanks a lot. An example is shown below : ENSMEUP00000002410 from Notamacropus Eugenii.

MGLSGAAGAAVLVLLAGHFSLGTALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQKNYDLSFLKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXILVGGVRFNNNPTLCNVETIQWKDIVGSAYVSNITIDNNSHPKSXXXXXXXXXXXXXXXXXXXXXXXXTKTICAQQCSGRCRGSSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVRKCPHNYVVTDHGSCVRSCNAETYEVEEDGVRKCKKCEGPCSKVCNGIGIGEFKDVLSINATNIKQFQNCTTISGDLHILPVAFKGDSFTNTPPLDPKELNILRTVKEISGFLLIQAWPENMTDLHAFEHLEIIRGRTKQHGQFSLAVVGVDITSLGLRSLKEISDGDVIISKNRQLCYANTINWSKLFGTRSQKTKITNNKDEKECRALGHVCHELCSSDGCWGPSSSHCLSCRYVSRQKKCVEKCNILEGEPREYMENLKCLQCHPECLPQLMNQTCTGPGPDKCVQCAHYIDGPHCVKTCPAGIMGEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPKIPSIATGIVGGFLLLMVLVLGIGLFIRRRRIVRKRTLRRLLQEREXXXXXXLSPPSGEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLITQLMPFGCLLDYIREHKDNIGSQYLLNWCVQIAKGMSYLEERRLVHRDLAARNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSVLEKGERLPQPPICTIDVYMIMVKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSATSNTSATVCIDRNGQQTCPVKEESFIQRYSSDPTTVLLEDNVDDSFQPVP

ENSMEUP00000002410 identifier seems to be pulling up Tammar wallaby entries.

28 days ago

if I remember correctly the X is the protein alternative for N in nucleotides, in other words an unknown aminoacid (and unknown as in "it couldn't be determined" not as in "new, never seen before").

this can happen is the genome where the gene/protein is determined in still has (quite some) Ns in the genomic sequence. if an N appears in the 'wrong' position in a codon you can't determine which AA it will result to and as such it is 'translated' as an X

This is correct. X means any amino acid. Most substitution matrices apply identical penalty (-1) when any amino-acid is aligned with X - even when X aligns with another X.