Running PAML without Tree file?
1
0
Entering edit mode
5.3 years ago
sunnykevin97 ▴ 980

Hi

After exploring little bit about PAML for calculating dN/dS rate two files are required, as an input codeml.ctl file

1) MSA in phylip format 2) Tree file.

My question is, already I know the phylogeny of my genome data. I'm interested in calculating only dN/dS rate among the each genome. How I do it ?

How to write a tree file manually, is it possible to write a tree file by looking in to the previous phylogeny ? Else, Is it possible to run codeml program with out tree file ?

suggestions please!

sequence alignment • 2.1k views
ADD COMMENT
1
Entering edit mode
5.3 years ago
Joe 21k

You need to calculate the tree empirically because things like branch lengths may be important. It would be possible to hand-write a tree with appropriate topology, but without distances it may give you false results.

Furthermore, in an ideal world, the tree you use to describe your data should be derived directly from the accompanying MSA.

ADD COMMENT
0
Entering edit mode

Thanks, totally I have 18 genome data-sets to estimate the dN/dS using PAML. Firstly, I'll align all the genomes using CLUSTAL and I'll generate a MSA file in phylip format (any good tools which handle big data-sets) ? But, how I'll generate a tree file to run PAML ? any tools ? data-sets are more in number is it a problem ?

ADD REPLY
3
Entering edit mode

CLUSTAL absolutely will not be able to handle full genome-scale alignments. There are few tools that really can. LASTZ is one of the few tools I've seen that deals with large sequences but even then, 18 is probably too many, and I don't know how big your genomes are (in my experience its alignments are kinda crappy too).

For dN/dS, the CDSs are the only thing that matters anyway, so I think a better approach would be to retrieve all CDSs for each genome, cluster the orthologs together (e.g. via CD-HIT or similar) to generate an alignment and tree, and then calculate a dN/dS for each gene (someone with more experience can absolutely correct me).

Once you have that you could work out an average value across the genome, or maybe even plot the dN/dS across the sequence to see which regions are more 'evolutionarily active'.

Its a few more steps, and will require some heavy duty parallel processing of all the genes, but its the only way I can think you'd do it.

ADD REPLY
0
Entering edit mode

Well, LASTZ is for pairwise comparison. But, I interested in multiple genome alignment ? My genome's are too big.

suggestions please. thanks!

ADD REPLY
0
Entering edit mode

Yeah, there's essentially no such thing as a genome-scale multiple alignment tool. Your approach simply isn't possible. Under other circumstances (estimating distance for instance) I'd suggest you could get by with multiple pairwise alignment, but for dNdS that isn't the case.

You could select a subset of genes of interest to base your analysis on, but whatever genes you choose will lead to an under or over estimation of the evolutionary rate - hence why I suggest doing all/as many as possible.

ADD REPLY
0
Entering edit mode

Thanks, for suggestions. I'll try.

ADD REPLY

Login before adding your answer.

Traffic: 2281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6