Question: Relatedness software for 20,000 exome sequencing datasets.
0
gravatar for rjobmc
3 months ago by
rjobmc0
rjobmc0 wrote:

Hi all, I have a dataset of around 20,000 exomes and I need a quick and efficient tool to look for relatedness in all of my samples. I have tried "relatedness2" from vcftools but it is very slow. It's also "pair-wise" and so not very efficient. We use we use multi-vcf.gz files. Does anyone know of a (possibly) C++ based software to do this? According to 23andme, only around 1000 SNV's are need to efficiently find related samples. We are looking for a tool to allow us to find any samples that are related up to second degree and also unknown duplicate samples in one run (if possible). I have done a pretty thorough search but not found anything. Can anyone help? Many thanks in advance

vcf.gz relatedness vcf • 196 views
ADD COMMENTlink modified 3 months ago by Shicheng Guo7.4k • written 3 months ago by rjobmc0
3
gravatar for Shicheng Guo
3 months ago by
Shicheng Guo7.4k
Shicheng Guo7.4k wrote:

Only select tag-SNPs and run the relatedness2, the less SNPs you use, the faster it is.

plink --vcf input.vcf --indep-pairwise 50 10 0.8 --recode vcf --out output the third parameter 0.8 is R2 threshold. you can try 0.6 0.5 or different values to control the number of SNPs. plink is a software you cannot ignored in population genetic analysis.

ADD COMMENTlink modified 12 weeks ago • written 3 months ago by Shicheng Guo7.4k

Hi Shicheng Guo, Where can I get these tag-SNPs for both WES and WGS? I presume it would be based on LD scores < 0.2 or something like that? Thanks for your help.

ADD REPLYlink written 3 months ago by rjobmc0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2426 users visited in the last hour