P-Value calculation from iHS and XP-EHH scores
1
3
Entering edit mode
4.2 years ago
SOHAIL ▴ 330

Hi Everyone,

I am anew in this. So forgive me if anything asking stupid! I calculated genome-wide standardized iHS and XP-EHH scores from 'Selscan' software. They both are present in negative and positive values. i read in some papers people choose |iHS| > 2 as a significant region cut-off, but i have read no cutoff for XP-EHH.

For XP-EHH values, i calculated p-values using 'pnorm' in R something like:

data<-read.table("CHR.xpehh-norm.reformat1.txt",header=FALSE,sep=" ")
p<-vector()
for (i in 1:dim(data)[1])
{if (data[i,4]>0)
p[i]<-pnorm(data[i,4],lower.tail=FALSE)
else p[i]<-pnorm(data[i,4],lower.tail=TRUE)
}
write.table(p,file="xpehh.p.chr.txt")


I have later read about p-values/Zscore calculation at C: A Database Of Signatures Of Selection In The 1000 Genomes Dataset

Problems:

1. can anyone please guide me how to correctly calculate the p-values for XP-EHHH and iHS scores (am i doing right in above mentioned scenario??).

2. and in iHS case we are considering only absolute values, how to calculate p-values for that?

3. and is there any general cutoff standardized scores (like |iHS| > 2) for the scores calculated by these two tests, so that selective outliers can be identified?

selection selscan ngs statistics R • 3.2k views
2
Entering edit mode
4.2 years ago

On the paper introducing iHS they recommended using that threshold, but that still won't give you a P-value. If you check the paper you will see that they decided to compute empirical P-values using an outlier approach. To do that you simply sort all the scores genome-wide and then divide the rank by the total number of values in your distribution.

For iHS you can use the absolute standardised iHS scores. For XP-EHH -because this test is directional-, you should only use positive values.

Then you have another option which is to compute approximate P-values by simulating the distribution of your selection statistics under a neutral demographic model. In this case you should have a fairly good understanding on the history of your population, to be able to accurately reproduce its demographic history.

Hope this helps!

0
Entering edit mode

Hi @JM88,

"For iHS you can use the absolute standardised iHS scores. For XP-EHH -because this test is directional-, you should only use positive values." "Another option which is to compute approximate P-values by simulating the distribution of your selection statistics under a neutral demographic model"

Questions: 1. Can you please explain Why we need to use only positive values for XP-EHH? 2. Can you please suggest any previously accepted software/methods by which i can accurately reproduce the demographic history and intergrate the results with selection model genome-wide?

Thanks!

0
Entering edit mode

Check the paper where Sabeti et al. introcuded the XP-EHH statistic. If I remember correctly you will find a detailed explanation of how XP-EHH is computed. Basically it is a ratio of the iHH of popA and popB, therefore directional. Also, if I remember correctly (again) the selscan manual has also a summary of all the statistics included in their software - including XP-EHH.

Probably the simulator ms (Hudson, 2002) would be a good way to start? Also check papers where they used this method instead of the outlier approach. I think the nSL paper used a simulation approach.