4.4 years ago by
USA/Philadelphia/iGEM
Hi eyb
selscan itself has a built in standardizing binary you can use ("norm"). However, in the past, I have done it in the following way:
1) read in all derived (or ancestral; it doesn't matter as long as you are consistent across all sites) allele frequencies and iHS scores into a table you can manipulate (pandas in Python works well)
2) bin your sites by 1% allele frequency (or larger depending on your total number of sites. For fewer sites, use larger bins; if you have entire chromosomes/large number of sites, e.g., 100,000+, with iHS scores, use 1%)
3) Use the zscore method you have been using on each bin separately. Or, you can calculate the mean and std.dev. manually for each of the iHS scores for a given AF bin, and use that to get a z-score that way. I am not familiar with the zscore method in scipy, but I imagine it does exactly this.
Voight mentioned in the iHS paper that the score should be standardized among sites that have a similar AF (which is why you bin the sites by AF before calculating a z-score). What you are doing currently is standardizing across the whole spectrum of AF.