Question: Question about input sequences for codeml in PAML
gravatar for pingpingnona.huang
6.9 years ago by
pingpingnona.huang0 wrote:

Hi, I want to run postive selection to one of my target gene by using codeml of PAML. At this moment, I have 156 sequences of this target gene, but there are only 15 haplotypes of the 156 sequences. I would like to know should I use all 156 sequences to run positive selection? Or should I use the 15 haplotypes sequences for positive selection? Thanks for helping me out of this quetion! 

paml • 2.2k views
ADD COMMENTlink modified 6.4 years ago by Brice Sarver3.6k • written 6.9 years ago by pingpingnona.huang0
gravatar for bagdevi.mishra
6.4 years ago by
bagdevi.mishra70 wrote:

I have tried working with using all sequences and also working with only the unique haplotypes. In both the cases, the result is same. Using less number of sequence is, for sure, computationally less expensive. So, working with 15 haplotypes is advisable.

ADD COMMENTlink written 6.4 years ago by bagdevi.mishra70
gravatar for Brice Sarver
6.4 years ago by
Brice Sarver3.6k
United States
Brice Sarver3.6k wrote:

It sounds like you are using data from many closely-related individuals if there are that many shared haplotypes. Using a dataset with currently segregating polymorphisms (as from population-level sampling) will inflate your estimates of omega. codeml works best with fixed polymorphisms among divergent groups - the 'power' to detect selection increases with the distance among sequences (i.e., longer branches = more power). To convince yourself of this, consider whether or not you could detect selection using a tree that has little resolution vs. a tree that has lots of structure.

If you absolutely cannot get around these restrictions and still need to use codeml, I would recommend using only the haplotypic data. The branches between individuals with similar/identical haplotypes will be small and will not provide any power to the analysis.

You can estimate a tree using a variety of approaches and then load that tree into codeml as a starting point or fixing the branch lengths (after converting them to units of substitutions per codon!). You could visualize the tree at this point.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Brice Sarver3.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2722 users visited in the last hour