Question: How do you run codeml for two genes in one file but with 1 species tree?
gravatar for DNAngel
5 months ago by
DNAngel30 wrote:

I want to test selection acting on one gene (set as my foreground) compared to a duplicate of the same gene (leaving it as background) but I'm not sure how to set up my files. I have alignments for the two genes, but this means I have to combine them into one alignment file. This means that all my species will have duplicated names. if I change the duplicated names then my one species tree won't work because half the sequences' names won't be detected.

I tried producing a tree with the combined MSA and it ended up looking like a smaller tree (for one gene)embedded in the middle of another tree for the second gene. I have no idea how I could unroot it like this...

Concatenating the dataset also does not make sense to me since I'd want to specifically make two partitions, one for each gene. Or is there another way to test for selection strength for duplicated genes?

codeml paml • 187 views
ADD COMMENTlink modified 5 months ago by shelkmike130 • written 5 months ago by DNAngel30

Could you please clarify what "my one species tree won't work" means?

ADD REPLYlink written 5 months ago by shelkmike130

In PAML you typically run one alignment file and one treefile where the species names in the alignment have to exist and match exactly to the names in the treefile. So if I want to double my alignment by adding a gene, I'd have to attach something to the second set of species names like "sp1-2", "sp2-2" otherwise duplicated names can cause a problem (at least it did when I ran it in different models on datamonkey). changing the second set of names, they no longer "exist" in the treefile. So it's just this weird loop of problems. Simply concatenating the second gene alignment to my first gene alignment doesn't help since I wanted to label the second gene as a foreground.

ADD REPLYlink written 5 months ago by DNAngel30
gravatar for shelkmike
5 months ago by
Russian Federation
shelkmike130 wrote:

If I have understood correctly, you are speaking about two orthogroups, which are descendants of a pair of paralogs in the last common ancestor of the studied species. Each species has one gene from each of the two orthogroups. The tree you provide to PAML should not be the species phylogenetic tree, but the genes' tree. If you have species A, B, C which form a tree [I'm using the Newick notation here] (A,(B,C)), the tree should be ((A1,(B1,C1)),(A2,(B2,C2))). The multiple sequence alignment should include sequences from both orthogroups, and the sequences must have the same titles as in the file with the tree.

ADD COMMENTlink modified 5 months ago • written 5 months ago by shelkmike130

Oh I see! Okay that makes sense. I will try that thank you!

ADD REPLYlink written 5 months ago by DNAngel30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1652 users visited in the last hour