Question: Running PAML without Tree file?
0
gravatar for sunnykevin97
5 months ago by
sunnykevin9710
sunnykevin9710 wrote:

Hi

After exploring little bit about PAML for calculating dN/dS rate two files are required, as an input codeml.ctl file

1) MSA in phylip format 2) Tree file.

My question is, already I know the phylogeny of my genome data. I'm interested in calculating only dN/dS rate among the each genome. How I do it ?

How to write a tree file manually, is it possible to write a tree file by looking in to the previous phylogeny ? Else, Is it possible to run codeml program with out tree file ?

suggestions please!

alignment sequence • 286 views
ADD COMMENTlink modified 5 months ago by jrj.healey12k • written 5 months ago by sunnykevin9710
1
gravatar for jrj.healey
5 months ago by
jrj.healey12k
United Kingdom
jrj.healey12k wrote:

You need to calculate the tree empirically because things like branch lengths may be important. It would be possible to hand-write a tree with appropriate topology, but without distances it may give you false results.

Furthermore, in an ideal world, the tree you use to describe your data should be derived directly from the accompanying MSA.

ADD COMMENTlink modified 5 months ago • written 5 months ago by jrj.healey12k

Thanks, totally I have 18 genome data-sets to estimate the dN/dS using PAML. Firstly, I'll align all the genomes using CLUSTAL and I'll generate a MSA file in phylip format (any good tools which handle big data-sets) ? But, how I'll generate a tree file to run PAML ? any tools ? data-sets are more in number is it a problem ?

ADD REPLYlink modified 5 months ago • written 5 months ago by sunnykevin9710
2

CLUSTAL absolutely will not be able to handle full genome-scale alignments. There are few tools that really can. LASTZ is one of the few tools I've seen that deals with large sequences but even then, 18 is probably too many, and I don't know how big your genomes are (in my experience its alignments are kinda crappy too).

For dN/dS, the CDSs are the only thing that matters anyway, so I think a better approach would be to retrieve all CDSs for each genome, cluster the orthologs together (e.g. via CD-HIT or similar) to generate an alignment and tree, and then calculate a dN/dS for each gene (someone with more experience can absolutely correct me).

Once you have that you could work out an average value across the genome, or maybe even plot the dN/dS across the sequence to see which regions are more 'evolutionarily active'.

Its a few more steps, and will require some heavy duty parallel processing of all the genes, but its the only way I can think you'd do it.

ADD REPLYlink modified 5 months ago • written 5 months ago by jrj.healey12k

Well, LASTZ is for pairwise comparison. But, I interested in multiple genome alignment ? My genome's are too big.

suggestions please. thanks!

ADD REPLYlink modified 5 months ago • written 5 months ago by sunnykevin9710

Yeah, there's essentially no such thing as a genome-scale multiple alignment tool. Your approach simply isn't possible. Under other circumstances (estimating distance for instance) I'd suggest you could get by with multiple pairwise alignment, but for dNdS that isn't the case.

You could select a subset of genes of interest to base your analysis on, but whatever genes you choose will lead to an under or over estimation of the evolutionary rate - hence why I suggest doing all/as many as possible.

ADD REPLYlink written 5 months ago by jrj.healey12k

Thanks, for suggestions. I'll try.

ADD REPLYlink written 5 months ago by sunnykevin9710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1630 users visited in the last hour