X amino acid in ensembl
1
0
Entering edit mode
28 days ago
Jason • 0

Hello all,

I am working on aligning proteins orthologs from different species. I am using the Ensembl API. Strangely, some protein sequences from non-human species have a lot of X. I wonder what does that mean? In theory, if their genome sequence is know, the protein sequence should be known, right? How do I score these X when I calculate the conservation scores? Thanks a lot. An example is shown below : ENSMEUP00000002410 from Notamacropus Eugenii.

MGLSGAAGAAVLVLLAGHFSLGTALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQKNYDLSFLKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXILVGGVRFNNNPTLCNVETIQWKDIVGSAYVSNITIDNNSHPKSXXXXXXXXXXXXXXXXXXXXXXXXTKTICAQQCSGRCRGSSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVRKCPHNYVVTDHGSCVRSCNAETYEVEEDGVRKCKKCEGPCSKVCNGIGIGEFKDVLSINATNIKQFQNCTTISGDLHILPVAFKGDSFTNTPPLDPKELNILRTVKEISGFLLIQAWPENMTDLHAFEHLEIIRGRTKQHGQFSLAVVGVDITSLGLRSLKEISDGDVIISKNRQLCYANTINWSKLFGTRSQKTKITNNKDEKECRALGHVCHELCSSDGCWGPSSSHCLSCRYVSRQKKCVEKCNILEGEPREYMENLKCLQCHPECLPQLMNQTCTGPGPDKCVQCAHYIDGPHCVKTCPAGIMGEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXPKIPSIATGIVGGFLLLMVLVLGIGLFIRRRRIVRKRTLRRLLQEREXXXXXXLSPPSGEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLITQLMPFGCLLDYIREHKDNIGSQYLLNWCVQIAKGMSYLEERRLVHRDLAARNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSVLEKGERLPQPPICTIDVYMIMVKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSATSNTSATVCIDRNGQQTCPVKEESFIQRYSSDPTTVLLEDNVDDSFQPVP

aminoacid alignment ensembl protein sequence • 227 views
ADD COMMENT
0
Entering edit mode

ENSMEUP00000002410 identifier seems to be pulling up Tammar wallaby entries.

ADD REPLY
2
Entering edit mode
28 days ago

if I remember correctly the X is the protein alternative for N in nucleotides, in other words an unknown aminoacid (and unknown as in "it couldn't be determined" not as in "new, never seen before").

this can happen is the genome where the gene/protein is determined in still has (quite some) Ns in the genomic sequence. if an N appears in the 'wrong' position in a codon you can't determine which AA it will result to and as such it is 'translated' as an X

ADD COMMENT
0
Entering edit mode

This is correct. X means any amino acid. Most substitution matrices apply identical penalty (-1) when any amino-acid is aligned with X - even when X aligns with another X.

ADD REPLY

Login before adding your answer.

Traffic: 2270 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6