Question: How do you run codeml for two genes in one file but with 1 species tree?
gravatar for DNAngel
16 months ago by
DNAngel40 wrote:

I want to test selection acting on one gene (set as my foreground) compared to a duplicate of the same gene (leaving it as background) but I'm not sure how to set up my files. I have alignments for the two genes, but this means I have to combine them into one alignment file. This means that all my species will have duplicated names. if I change the duplicated names then my one species tree won't work because half the sequences' names won't be detected.

I tried producing a tree with the combined MSA and it ended up looking like a smaller tree (for one gene)embedded in the middle of another tree for the second gene. I have no idea how I could unroot it like this...

Concatenating the dataset also does not make sense to me since I'd want to specifically make two partitions, one for each gene. Or is there another way to test for selection strength for duplicated genes?

codeml paml • 438 views
ADD COMMENTlink modified 16 months ago by shelkmike180 • written 16 months ago by DNAngel40

Could you please clarify what "my one species tree won't work" means?

ADD REPLYlink written 16 months ago by shelkmike180

In PAML you typically run one alignment file and one treefile where the species names in the alignment have to exist and match exactly to the names in the treefile. So if I want to double my alignment by adding a gene, I'd have to attach something to the second set of species names like "sp1-2", "sp2-2" otherwise duplicated names can cause a problem (at least it did when I ran it in different models on datamonkey). changing the second set of names, they no longer "exist" in the treefile. So it's just this weird loop of problems. Simply concatenating the second gene alignment to my first gene alignment doesn't help since I wanted to label the second gene as a foreground.

ADD REPLYlink written 16 months ago by DNAngel40
gravatar for shelkmike
16 months ago by
Russian Federation
shelkmike180 wrote:

If I have understood correctly, you are speaking about two orthogroups, which are descendants of a pair of paralogs in the last common ancestor of the studied species. Each species has one gene from each of the two orthogroups. The tree you provide to PAML should not be the species phylogenetic tree, but the genes' tree. If you have species A, B, C which form a tree [I'm using the Newick notation here] (A,(B,C)), the tree should be ((A1,(B1,C1)),(A2,(B2,C2))). The multiple sequence alignment should include sequences from both orthogroups, and the sequences must have the same titles as in the file with the tree.

ADD COMMENTlink modified 16 months ago • written 16 months ago by shelkmike180

Oh I see! Okay that makes sense. I will try that thank you!

ADD REPLYlink written 16 months ago by DNAngel40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour