Composite Of Multiple Signals; Where Have You Gone?
3
5
Entering edit mode
12.3 years ago

This composite method for detecting natural selection seems to have made a big splash.

http://www.sciencemag.org/content/327/5967/883.abstract

However, does an implementation that can be downloaded exist? I have read the paper, visited the lab's website and poked around the internet, but no luck.

Anyone know of a newer paper or tool that I should consider using in place of CMS?

fst selection • 4.1k views
ADD COMMENT
6
Entering edit mode
12.3 years ago
Neilfws 49k

According to this press release from 2010: "The software tools for CMS analysis are all homegrown, and should soon be available, perhaps wrapped into a program called Sweep for the long haplotype analysis."

I suggest you email the authors and ask how things are coming along :)

Of course, if journals required that software used for analyses be made available, we would not have this kind of problem. It astonishes me that so-called top-tier journals will accept the results of an analysis with no concern regarding the tools used to do it.

ADD COMMENT
0
Entering edit mode

Even more disturbing is when as a reviewer, you ask that code used be made publicly available as a condition of publication and the editor of the journal doesn't consider this to be a valid or relevant consideration...

ADD REPLY
0
Entering edit mode

I figured it wasn't available just though I would ask. Seems pretty shady not to even release the code.

ADD REPLY
2
0
Entering edit mode

Thanks for the update. It was probably released with: "Identifying Recent Adaptations in Large-Scale Genomic Data"?

ADD REPLY
1
Entering edit mode
11.0 years ago

I tried to implement the CMS some time ago, but in the end, due to lack of time, I could not complete it.

The problem with the CMS is that you have to create a set of simulations, using the 90 sets of parameters that can be found in the Supplementary Materials of the paper, plus one scenario for neutral evolution. The simulations have never been made available, but can be generated easily using cosi (you will have to adjust the allele frequency spectrum, and remove the simulations that are too different from the rest of the genome). So, even if you can find the CMS script, you also need the set of simulations; I suppose that because these are large files, the CMS has never been made available online.

Once you have got the simulations, you have to calculate the Fst, dDAF, iHS, and other tests. From that, you have to calculate the distribution of values for each test, and in the two datasets of simulations (neutral and selection). Then, you can calculate the p-value of a given SNP in the genome by calculating the same tests (Fst, dDAF, iHS, ...), and calculating the probability of observing the value in the two sets of simulations. The CMS is just the multiplication of the ratios of the p-values calculated in this way. If you have another method to calculate the p-values, you can also consider implementing your custom CMS, just by multiplying the p-values.

One big problem in the CMS is that it assumes that all the tests have the same ability to detect a selective sweep. For example, Fst and iHS should have the same ability to detect selection. I personally don't think that this assumption is very correct, also considering that these tests detect different types of selection. It would be better to use a method that can give some weights to each of the tests used for selection; for example, you can have a look at this paper (Lin et al 2011, Distinguishing Positive Selection From Neutral Evolution: Boosting the Performance of Summary Statistics), where the authors used a technique called boosting.

ADD COMMENT
1
Entering edit mode

Thanks, this is the most detailed explanation I've found for this approach! How long did it take you? How far did you get?

ADD REPLY
0
Entering edit mode

The step that takes the most time is generating all the simulations. Moreover, you also have to calculate iHS on each simulation, which is not easy, provided that the iHS script provided by the Sabeti lab is very slow, and sometimes returns a segmentation fault without any explanation.

ADD REPLY
0
Entering edit mode

As a graduate student sometimes I just have to do what I am asked... I agree with you especially since I am mostly working on non-model genomes in which population parameters are not known.

ADD REPLY

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6