Hi all, I have a dataset of around 20,000 exomes and I need a quick and efficient tool to look for relatedness in all of my samples. I have tried "relatedness2" from vcftools but it is very slow. It's also "pair-wise" and so not very efficient. We use we use multi-vcf.gz files. Does anyone know of a (possibly) C++ based software to do this? According to 23andme, only around 1000 SNV's are need to efficiently find related samples. We are looking for a tool to allow us to find any samples that are related up to second degree and also unknown duplicate samples in one run (if possible). I have done a pretty thorough search but not found anything. Can anyone help? Many thanks in advance
Only select tag-SNPs and run the relatedness2, the less SNPs you use, the faster it is.
plink --vcf input.vcf --indep-pairwise 50 10 0.8 --recode vcf --out output
the third parameter 0.8 is R2 threshold. you can try 0.6 0.5 or different values to control the number of SNPs.
plink is a software you cannot ignored in population genetic analysis.