Question: Relatedness software for 20,000 exome sequencing datasets.
gravatar for rjobmc
19 months ago by
rjobmc0 wrote:

Hi all, I have a dataset of around 20,000 exomes and I need a quick and efficient tool to look for relatedness in all of my samples. I have tried "relatedness2" from vcftools but it is very slow. It's also "pair-wise" and so not very efficient. We use we use multi-vcf.gz files. Does anyone know of a (possibly) C++ based software to do this? According to 23andme, only around 1000 SNV's are need to efficiently find related samples. We are looking for a tool to allow us to find any samples that are related up to second degree and also unknown duplicate samples in one run (if possible). I have done a pretty thorough search but not found anything. Can anyone help? Many thanks in advance

vcf.gz relatedness vcf • 486 views
ADD COMMENTlink modified 19 months ago by Shicheng Guo8.3k • written 19 months ago by rjobmc0
gravatar for Shicheng Guo
19 months ago by
Shicheng Guo8.3k
Shicheng Guo8.3k wrote:

Only select tag-SNPs and run the relatedness2, the less SNPs you use, the faster it is.

plink --vcf input.vcf --indep-pairwise 50 10 0.8 --recode vcf --out output the third parameter 0.8 is R2 threshold. you can try 0.6 0.5 or different values to control the number of SNPs. plink is a software you cannot ignored in population genetic analysis.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Shicheng Guo8.3k

Hi Shicheng Guo, Where can I get these tag-SNPs for both WES and WGS? I presume it would be based on LD scores < 0.2 or something like that? Thanks for your help.

ADD REPLYlink written 19 months ago by rjobmc0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 660 users visited in the last hour