Question: How Can I Compare Rates Of Evolution For Two Sets Of Genes?
3
gravatar for terdon
6.7 years ago by
terdon410
terdon410 wrote:

I have a list of candidate genes as the result of my analysis. I am now trying to find various characteristics that they have in common. One of the things I would like to check is if my candidate genes are evolving faster or slower than the rest of the genes in my dataset.

Now, I know how to do this manually by building multi species alignments for each of my gene products and calculating ka/ks ratios for each set of alignments. This, however, is not a trivial process and I really do not want to do this manually for my ~1500 genes.

Can anyone suggest a tool that will take two lists of genes (or proteins) and return an estimate of evolutionary rate for each gene?

evolution • 5.4k views
ADD COMMENTlink modified 6.7 years ago by Rahul Sharma600 • written 6.7 years ago by terdon410
6
gravatar for Josh Herr
6.7 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

In the past I've used Tajima's D as a measure of sequence evolution. There are numerous methods, but (IMHO) this measure seems to be the best accepted as a way to measure if two (or more) sets of genes are evolving over time.

It's not difficult to compute it on your own, but there are a few scripts out there to do it for you. Check out the MANVa software package, or two helpful scripts: DENSERM_P in Perl and analyzer HKA version 6 in C.

This paper helped me measure evolutionary rates using SNP data.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Josh Herr5.6k
2

Note: Tajima's D is a population genetic statistic - it measures whether a locus is under some non-random force (including population expansion/contraction) by comparing allele frequencies in a (hopefully random) population sample of sequences. From the O.P. it sounds like terdon wants to compare evolutionary rates among species, which is not what D will do.

ADD REPLYlink written 6.7 years ago by David W4.7k

+1 on both answers and thanks for the correction. I have in fact used Tajima's D for sequence evolution at the level of populations to look for selection against alleles in putative clonal plants, so that would make sense. I appreciate the clarification.

You have a great blog too. Going to check out your MMOD R library now...

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Josh Herr5.6k

Yup, I don't have population data. What I have is a list of candidate proteins (easily mapped back to genes) and a list of non candidate proteins. Based on my analysis I expect my candidates to be evolving faster than my non candidates. I was hoping to use something like eggNOG or EGO to map each of my cands to an orthologous group and obtain a measure of the rate of evolutionary change for each of those groups.

ADD REPLYlink written 6.7 years ago by terdon410
6
gravatar for David W
6.7 years ago by
David W4.7k
New Zealand
David W4.7k wrote:

Hi Terdon,

I would chose my favorite scripting language and put together a pipeline. Presuming you have your orthologous sets of sequences sorted already the process is quite straight forward

  • Write each sequence-set to a separate file
  • Align sequences with [muscle/tcoffe/PRANK]
  • Calculate Ka/Ks for each alignment
  • Parse the result and pump it into a csv file so you can compare rates in candidate v "background"

I would use codeML to estimate one KaKs value for each locus, but be aware to do that you'd need (a) a tree relating your species to each other and (b) to write control files for each analysis. Biopython has a module that deals with codeML, that could potentially make that step easier.

(You should also note, KaKs is not really a rate of evolution, but the degree to which variants in a sequence are evolving under positive or negative selection.)

edit: If you do put together a pipleline like this be sure to check at least a sample of the alignments by eye. Misalignment are a major source of error automated screens of this sort

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by David W4.7k

Yeah, that's kind of what I am trying to avoid doing :). This is a small part of my project and I don't want to spend that much time on it. I don't have the orthologous groups (though I do know how to get them), I was hoping for a magic bullet database that would map each of my cands to an orthologous group and return a measure of the rate of evolutionary change for each of those groups. Can't I get this from eggNOG or inPARANOID or BioMart or metaphors or somewhere?

ADD REPLYlink written 6.7 years ago by terdon410
0
gravatar for AGS
6.7 years ago by
AGS230
Brooklyn, ny
AGS230 wrote:

Why not fit them to a global vs. local clock? Easy to do in PAML.

ADD COMMENTlink written 6.7 years ago by AGS230

Could you expand on that please?

ADD REPLYlink written 6.7 years ago by terdon410

Run PAML twice. One time with your genes set to a local clock, the other with all the data set to a global clock. You could then pull the likelihoods for each test from each gene out and do a chi^2 test and a multiple test correction. To find out if your gene is evolving faster or slower, you need to find the scale factor that is in the PAML output files.

ADD REPLYlink written 6.7 years ago by AGS230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 598 users visited in the last hour