"U" in protein sequence
1
0
Entering edit mode
5 weeks ago
Shaurya • 0

I have sampled and aligned a set of 7000 proteomes. I tried to use RAxML to make maximum likelihood trees but I get an error saying "unknown character "U" is at position xyz". How do I remove the U's in my sampled file and what do I replace it with ? Or is there any other software to make maximum likelihood trees that can recognize U ?

phylogenetics alignment fasta • 159 views
1
Entering edit mode
5 weeks ago
Mensur Dlakic ★ 15k

Normally there is no U character in proteins, but sometimes it stands for selenocysteine. I suggest you make sure that you are not aligning an RNA sequence instead. If it is a protein, you can either remove the whole offending sequence (it should still be a pretty good dataset with 6999 proteomes), or replace U with C.

Out of curiosity, why are you aligning that many proteomes? It is almost impossible to have a meaningful view of that tree. Besides, if this is for prokaryotes, it is almost a guarantee that a better and larger tree already exists here.