Question: Multiple asterisks inside amino acid fasta sequences
3.8 years ago by
Hi all,


I have recently come across amino acid sequences with a lot of asterisks inside the sequences. For example, X*X*XXX*XXX. Can anyone explain to me what it means? I understand that some sequences will have asterisk at the end to indicate stop codon. But for this case they occur multiple times. They don't seem like unknown amino acid because there are also Xs as well. Thanks a lot!



3.8 years ago by
* means stop codon. You can use simple methods to translate DNA to protein and they will result with internal stop codons, it's all a matter of what you defined (frame, stop at stop or continue). If you see a sequence like this it's probably not a proper protein sequence.

Depending on the prediction software, * in the middle of an ORF could also represent e.g. a UGA codon (stop/tryptophan/selenocysteine depending on the translation system).

To expand on that -

Normally, if you see lots of stop codons, that means you are either translating a non-coding region, or are in the wrong reading frame.

