Hi everyone,
I’m working with a dataset of samples from three different populations, sequenced using targeted amplicon panels covering 48 genes. Although most iHS and XP-EHH analyses are typically done using whole-genome data, I’m exploring whether it's still meaningful to apply these haplotype-based selection scans to targeted sequencing data.
So far, I’ve phased the VCF using Beagle and used scikit-allel in Python to calculate XP-EHH between the two populations. Some positions show extreme values (>2 or <–2), but many remain inconclusive or missing, likely due to SNP density and regional limitations. I also attempted iHS per population, but the sparse data posed challenges. In addition to these, I’ve also performed Fst, Tajima’s D, and PCA to explore population structure and signatures of selection. My main question is: Can iHS and XP-EHH still offer valid insights when applied to datasets that only cover small portions of the genome? I’m aware of the caveats but curious if anyone has experience or recommendations when working with non-WGS data in this context.
Appreciate any thoughts or references!