"U" in protein sequence
1
0
Entering edit mode
5 weeks ago
Shaurya • 0

I have sampled and aligned a set of 7000 proteomes. I tried to use RAxML to make maximum likelihood trees but I get an error saying "unknown character "U" is at position xyz". How do I remove the U's in my sampled file and what do I replace it with ? Or is there any other software to make maximum likelihood trees that can recognize U ?

phylogenetics alignment fasta • 159 views
ADD COMMENT
1
Entering edit mode
5 weeks ago
Mensur Dlakic ★ 15k

Normally there is no U character in proteins, but sometimes it stands for selenocysteine. I suggest you make sure that you are not aligning an RNA sequence instead. If it is a protein, you can either remove the whole offending sequence (it should still be a pretty good dataset with 6999 proteomes), or replace U with C.

Out of curiosity, why are you aligning that many proteomes? It is almost impossible to have a meaningful view of that tree. Besides, if this is for prokaryotes, it is almost a guarantee that a better and larger tree already exists here.

ADD COMMENT

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6