We have some genotype data that we are putting through quality control in PLINK 1.9. As part of this QC, we have limited the data to subjects of self-reported and genomic European ancestries, and have run basic QC steps, removing call rates below 95% for individuals and variants, and removing variants excessively out of Hardy-Weinberg equilibrium (< 10^-10).
Pruning for LD and limiting to the autosomes, we have calculated estimated IBD between our participants and the estimated the average pairwise IBD for each participant. We have found that an unusual number of our participants (233/13049) show a high mean pairwise IBD with others (>3SD from the cohort mean, > 0.028 in this instance, maximum 0.075). Examining examples of the individuals in question shows that this is driven by low-level relatedness (~0.03-0.1) with many other individuals, rather than a small subset of relatives in the data. In addition, the relatedness in each case stems entirely from IBD1 (one shared variant) and not at all from IBD2 (two shared variants). Individual average IBD correlates negatively (-0.69) with the genome-wide inbreeding coefficient.
I have one idea about what could be causing this (partial cross-contamination of the samples), but I am interested to know if others have other ideas what would be the cause?