I have calculated XP-EHH and iHS scores for a set of snps using selscan. XP-EHH ranges from -0.75 to 0.9. What do extreme values show? In the original publication they plot log(P-value). I think that P-values was calculated to show differences between xp-ehh and iHS. Does anyone have experience with selection scores?
XP-EHH is a cross-population test for positive selective. It means that it detect SNPs that are under selection in one population but not in another. For example a SNP associated to resistance to malaria may be under selection in populations where malaria has been endemic but under neutral selection in other populations. It was developed because it is generally very difficult to identify signals of selection in a population, and comparing two populations may allow to identify weaker signals that are evident only after comparing with a closely related population.
The sign of the XP-EHH score indicates which of the two alleles is under selection, e.g. whether the ancestral or the derived allele. For practical reasons, people usually tend to ignore the sign and use the absolute XP-EHH score. This is because you may not always be sure about which SNP is ancestral in which population. Moreover taking the absolute score makes it easier to calculate mean by sliding windows.
In the publication they used -log(p-value), probably as a way to simplify the interpretation of the data in the plot. The value is usually generated by sorting the scores and taking the rank of them - e.g. see how I answered Zev in this discussion: C: A Database Of Signatures Of Selection In The 1000 Genomes Dataset . The p-value is then converted to -log(p-value) to facilitate the interpretation. For example a p-value of 0.01 becomes -log(0.01) = 2, so you can say that all SNPs with a -log(p-value) higher than 2 are significantly selected in one population.
Thanks. I will try to calculate p-values this way.
Isn't the sign of the XP-EHH score indicative whether the SNP is under selection in your tested or reference population? I thought (depending on how you computed XPEHH) you would only either positive or negative values.
From the supp material:
Yes, it is directional but in many cases you don't really know which is the ancestral allele or not. In that case it is safer to get the absolute score, and determine whether there is selection between the two populations, without knowing which is the allele selected. Moreover people tend to calculate the average XP-EHH score for a region, averaging or weighting the scores for multiple SNPs. In that case, if you don't use absolute score, the average for the region may be close to 0 because scores with different signs will cancel each other out.
This is what I understand as well. I think the symbol for +ve or -ve in iHS that tells whether it ancestral/derived allele.
So, about Giovanni's answer, his comment about -log(p-value) higher than 2 to be significant is correct?