Question

Relatedness software for 20,000 exome sequencing datasets.

0

Entering edit mode

5.4 years ago

rjobmc • 0

Hi all,

I have a dataset of around 20,000 exomes and I need a quick and efficient tool to look for relatedness in all of my samples. I have tried "relatedness2" from vcftools but it is very slow. It's also "pair-wise" and so not very efficient.

We use we use multi-vcf.gz files. Does anyone know of a (possibly) C++ based software to do this? According to 23andme, only around 1000 SNV's are need to efficiently find related samples. We are looking for a tool to allow us to find any samples that are related up to second degree and also unknown duplicate samples in one run (if possible). I have done a pretty thorough search but not found anything.

Can anyone help?

Many thanks in advance

vcf relatedness • 1.2k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 5.4 years ago by rjobmc • 0

score 3 · Answer 1 · 2018-11-19

3

Entering edit mode

5.4 years ago

Shicheng Guo ★ 9.4k

Only select tag-SNPs and run the relatedness2, the less SNPs you use, the faster it is.

plink --vcf input.vcf --indep-pairwise 50 10 0.8 --recode vcf --out output the third parameter 0.8 is R2 threshold. you can try 0.6 0.5 or different values to control the number of SNPs. plink is a software you cannot ignored in population genetic analysis.

ADD COMMENT • link 5.4 years ago by Shicheng Guo ★ 9.4k

0

Entering edit mode

Hi Shicheng Guo, Where can I get these tag-SNPs for both WES and WGS? I presume it would be based on LD scores < 0.2 or something like that? Thanks for your help.

ADD REPLY • link 5.4 years ago by rjobmc • 0