Hi Everyone,
I am anew in this. So forgive me if anything asking stupid! I calculated genome-wide standardized iHS and XP-EHH scores from 'Selscan' software. They both are present in negative and positive values. i read in some papers people choose |iHS| > 2 as a significant region cut-off, but i have read no cutoff for XP-EHH.
For XP-EHH values, i calculated p-values using 'pnorm' in R something like:
data<-read.table("CHR.xpehh-norm.reformat1.txt",header=FALSE,sep=" ")
p<-vector()
for (i in 1:dim(data)[1])
{if (data[i,4]>0)
p[i]<-pnorm(data[i,4],lower.tail=FALSE)
else p[i]<-pnorm(data[i,4],lower.tail=TRUE)
}
write.table(p,file="xpehh.p.chr.txt")
I have later read about p-values/Zscore calculation at C: A Database Of Signatures Of Selection In The 1000 Genomes Dataset
Problems:
can anyone please guide me how to correctly calculate the p-values for XP-EHHH and iHS scores (am i doing right in above mentioned scenario??).
and in iHS case we are considering only absolute values, how to calculate p-values for that?
and is there any general cutoff standardized scores (like |iHS| > 2) for the scores calculated by these two tests, so that selective outliers can be identified?
Thanks a lot in advance!
Hi @JM88,
Thanks for your response, However, here in the comments you mentioned:
"For iHS you can use the absolute standardised iHS scores. For XP-EHH -because this test is directional-, you should only use positive values." "Another option which is to compute approximate P-values by simulating the distribution of your selection statistics under a neutral demographic model"
Questions: 1. Can you please explain Why we need to use only positive values for XP-EHH? 2. Can you please suggest any previously accepted software/methods by which i can accurately reproduce the demographic history and intergrate the results with selection model genome-wide?
Thanks!
Check the paper where Sabeti et al. introcuded the XP-EHH statistic. If I remember correctly you will find a detailed explanation of how XP-EHH is computed. Basically it is a ratio of the iHH of popA and popB, therefore directional. Also, if I remember correctly (again) the selscan manual has also a summary of all the statistics included in their software - including XP-EHH.
Probably the simulator ms (Hudson, 2002) would be a good way to start? Also check papers where they used this method instead of the outlier approach. I think the nSL paper used a simulation approach.