How Can I Compare Rates Of Evolution For Two Sets Of Genes?
3
3
Entering edit mode
8.9 years ago
terdon ▴ 410

I have a list of candidate genes as the result of my analysis. I am now trying to find various characteristics that they have in common. One of the things I would like to check is if my candidate genes are evolving faster or slower than the rest of the genes in my dataset.

Now, I know how to do this manually by building multi species alignments for each of my gene products and calculating ka/ks ratios for each set of alignments. This, however, is not a trivial process and I really do not want to do this manually for my ~1500 genes.

Can anyone suggest a tool that will take two lists of genes (or proteins) and return an estimate of evolutionary rate for each gene?

evolution • 9.4k views
ADD COMMENT
6
Entering edit mode
8.9 years ago
Josh Herr 5.7k

In the past I've used Tajima's D as a measure of sequence evolution. There are numerous methods, but (IMHO) this measure seems to be the best accepted as a way to measure if two (or more) sets of genes are evolving over time.

It's not difficult to compute it on your own, but there are a few scripts out there to do it for you. Check out the MANVa software package, or two helpful scripts: DENSERM_P in Perl and analyzer HKA version 6 in C.

This paper helped me measure evolutionary rates using SNP data.

ADD COMMENT
2
Entering edit mode

Note: Tajima's D is a population genetic statistic - it measures whether a locus is under some non-random force (including population expansion/contraction) by comparing allele frequencies in a (hopefully random) population sample of sequences. From the O.P. it sounds like terdon wants to compare evolutionary rates among species, which is not what D will do.

ADD REPLY
0
Entering edit mode

+1 on both answers and thanks for the correction. I have in fact used Tajima's D for sequence evolution at the level of populations to look for selection against alleles in putative clonal plants, so that would make sense. I appreciate the clarification.

You have a great blog too. Going to check out your MMOD R library now...

ADD REPLY
0
Entering edit mode

Yup, I don't have population data. What I have is a list of candidate proteins (easily mapped back to genes) and a list of non candidate proteins. Based on my analysis I expect my candidates to be evolving faster than my non candidates. I was hoping to use something like eggNOG or EGO to map each of my cands to an orthologous group and obtain a measure of the rate of evolutionary change for each of those groups.

ADD REPLY
6
Entering edit mode
8.9 years ago
David W 4.8k

Hi Terdon,

I would chose my favorite scripting language and put together a pipeline. Presuming you have your orthologous sets of sequences sorted already the process is quite straight forward

  • Write each sequence-set to a separate file
  • Align sequences with [muscle/tcoffe/PRANK]
  • Calculate Ka/Ks for each alignment
  • Parse the result and pump it into a csv file so you can compare rates in candidate v "background"

I would use codeML to estimate one KaKs value for each locus, but be aware to do that you'd need (a) a tree relating your species to each other and (b) to write control files for each analysis. Biopython has a module that deals with codeML, that could potentially make that step easier.

(You should also note, KaKs is not really a rate of evolution, but the degree to which variants in a sequence are evolving under positive or negative selection.)

edit: If you do put together a pipleline like this be sure to check at least a sample of the alignments by eye. Misalignment are a major source of error automated screens of this sort

ADD COMMENT
0
Entering edit mode

Yeah, that's kind of what I am trying to avoid doing :). This is a small part of my project and I don't want to spend that much time on it. I don't have the orthologous groups (though I do know how to get them), I was hoping for a magic bullet database that would map each of my cands to an orthologous group and return a measure of the rate of evolutionary change for each of those groups. Can't I get this from eggNOG or inPARANOID or BioMart or metaphors or somewhere?

ADD REPLY
0
Entering edit mode
8.9 years ago
AGS ▴ 230

Why not fit them to a global vs. local clock? Easy to do in PAML.

ADD COMMENT
0
Entering edit mode

Could you expand on that please?

ADD REPLY
0
Entering edit mode

Run PAML twice. One time with your genes set to a local clock, the other with all the data set to a global clock. You could then pull the likelihoods for each test from each gene out and do a chi^2 test and a multiple test correction. To find out if your gene is evolving faster or slower, you need to find the scale factor that is in the PAML output files.

ADD REPLY

Login before adding your answer.

Traffic: 1447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6