P-Value calculation from iHS and XP-EHH scores
1
3
Entering edit mode
4.2 years ago
SOHAIL ▴ 330

Hi Everyone,

I am anew in this. So forgive me if anything asking stupid! I calculated genome-wide standardized iHS and XP-EHH scores from 'Selscan' software. They both are present in negative and positive values. i read in some papers people choose |iHS| > 2 as a significant region cut-off, but i have read no cutoff for XP-EHH.

For XP-EHH values, i calculated p-values using 'pnorm' in R something like:

data<-read.table("CHR.xpehh-norm.reformat1.txt",header=FALSE,sep=" ") 
p<-vector()
for (i in 1:dim(data)[1])
{if (data[i,4]>0)
  p[i]<-pnorm(data[i,4],lower.tail=FALSE)
else p[i]<-pnorm(data[i,4],lower.tail=TRUE)
}
write.table(p,file="xpehh.p.chr.txt")

I have later read about p-values/Zscore calculation at C: A Database Of Signatures Of Selection In The 1000 Genomes Dataset

Problems:

  1. can anyone please guide me how to correctly calculate the p-values for XP-EHHH and iHS scores (am i doing right in above mentioned scenario??).

  2. and in iHS case we are considering only absolute values, how to calculate p-values for that?

  3. and is there any general cutoff standardized scores (like |iHS| > 2) for the scores calculated by these two tests, so that selective outliers can be identified?

Thanks a lot in advance!

selection selscan ngs statistics R • 3.2k views
ADD COMMENT
2
Entering edit mode
4.2 years ago

On the paper introducing iHS they recommended using that threshold, but that still won't give you a P-value. If you check the paper you will see that they decided to compute empirical P-values using an outlier approach. To do that you simply sort all the scores genome-wide and then divide the rank by the total number of values in your distribution.

For iHS you can use the absolute standardised iHS scores. For XP-EHH -because this test is directional-, you should only use positive values.

Then you have another option which is to compute approximate P-values by simulating the distribution of your selection statistics under a neutral demographic model. In this case you should have a fairly good understanding on the history of your population, to be able to accurately reproduce its demographic history.

Hope this helps!

ADD COMMENT
0
Entering edit mode

Hi @JM88,

Thanks for your response, However, here in the comments you mentioned:

"For iHS you can use the absolute standardised iHS scores. For XP-EHH -because this test is directional-, you should only use positive values." "Another option which is to compute approximate P-values by simulating the distribution of your selection statistics under a neutral demographic model"

Questions: 1. Can you please explain Why we need to use only positive values for XP-EHH? 2. Can you please suggest any previously accepted software/methods by which i can accurately reproduce the demographic history and intergrate the results with selection model genome-wide?

Thanks!

ADD REPLY
0
Entering edit mode

Check the paper where Sabeti et al. introcuded the XP-EHH statistic. If I remember correctly you will find a detailed explanation of how XP-EHH is computed. Basically it is a ratio of the iHH of popA and popB, therefore directional. Also, if I remember correctly (again) the selscan manual has also a summary of all the statistics included in their software - including XP-EHH.

Probably the simulator ms (Hudson, 2002) would be a good way to start? Also check papers where they used this method instead of the outlier approach. I think the nSL paper used a simulation approach.

ADD REPLY

Login before adding your answer.

Traffic: 1285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6