Question: Relatedness software for 20,000 exome sequencing datasets.
gravatar for rjobmc
8 months ago by
rjobmc0 wrote:

Hi all, I have a dataset of around 20,000 exomes and I need a quick and efficient tool to look for relatedness in all of my samples. I have tried "relatedness2" from vcftools but it is very slow. It's also "pair-wise" and so not very efficient. We use we use multi-vcf.gz files. Does anyone know of a (possibly) C++ based software to do this? According to 23andme, only around 1000 SNV's are need to efficiently find related samples. We are looking for a tool to allow us to find any samples that are related up to second degree and also unknown duplicate samples in one run (if possible). I have done a pretty thorough search but not found anything. Can anyone help? Many thanks in advance

vcf.gz relatedness vcf • 287 views
ADD COMMENTlink modified 8 months ago by Shicheng Guo7.6k • written 8 months ago by rjobmc0
gravatar for Shicheng Guo
8 months ago by
Shicheng Guo7.6k
Shicheng Guo7.6k wrote:

Only select tag-SNPs and run the relatedness2, the less SNPs you use, the faster it is.

plink --vcf input.vcf --indep-pairwise 50 10 0.8 --recode vcf --out output the third parameter 0.8 is R2 threshold. you can try 0.6 0.5 or different values to control the number of SNPs. plink is a software you cannot ignored in population genetic analysis.

ADD COMMENTlink modified 7 months ago • written 8 months ago by Shicheng Guo7.6k

Hi Shicheng Guo, Where can I get these tag-SNPs for both WES and WGS? I presume it would be based on LD scores < 0.2 or something like that? Thanks for your help.

ADD REPLYlink written 8 months ago by rjobmc0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 592 users visited in the last hour