Or indeed, any of the Bio* projects: they all are well suited to building simple sequence analysis pipelines for many sequences.
Hi Guys,
I'm looking for the best way to (1) compute dN/dS ratios on thousands couples of sequences and (2) get descriptive statistics on these thousands of sequences (length, GC3, etc...) at the same time?
What are the software available to perform that job?
Cheers
You could use BioPerl to do this. Calculating dN:dS can be done in a BioPerl script by running PAML. See this page http://www.bioperl.org/wiki/HOWTO:PAML and slide 76 of this presentation http://jason.open-bio.org/Bioperl_Tutorials/ISMB2007/. To count the GC there is a script here http://www.bioperl.org/wiki/Bioperl_scripts#SeqStats and to modify it by codon position should be quite possible.
Have a look at HyPhy(Hypothesis testing using Phylogenies). http://www.datam0nk3y.org/hyphy/doku.php It has a specilized module for Positive and negative selection detection and which can be relevant to what you want to accomplish for a large dataset of sequences It might help you out.
To clarify: http://en.wikipedia.org/wiki/Ka/Ks_ratio You are referring to the ratio of Non-synonymous vs. Synonymous substitution, correct?