Question: Checking kinship coefficients and relationships and comparing genotyping data to exomes
gravatar for Dan Gaston
3.5 years ago by
Dan Gaston6.9k
Dan Gaston6.9k wrote:

I am wondering what everyones favourite tools are for checking kinship coefficients between individuals in family studies? I am doing this for QC on some projects where the results we were getting were not what we expected. 


In this case I was using KING with plink formatted files that either came from genotyping data exported from Genome Studio or were generated from VCF files from exome sequencing data using vcftools (to convert from VCF to plink). In some cases the calculated kinship coefficients didn't match up, although some of that may be issues with filtering of variants in the VCF case.

What does everyone like to use for these tasks? Also do you have favourite tools for comparing SNPs from a genotyping experiment with your VCF files for quality control purposes?




snp qc kinship pedigree vcf • 6.1k views
ADD COMMENTlink modified 2.1 years ago by SteveL30 • written 3.5 years ago by Dan Gaston6.9k
gravatar for Jeremy Leipzig
3.5 years ago by
Philadelphia, PA
Jeremy Leipzig17k wrote:

Are you talking about that vcf kinship tutorial from Aaron Quinlan? Yeah, that doesn't work. I don't really know what he was thinking there since VCFs are usually variant-only so the number of reference calls seen is affected by the size of the cohort (the width of the VCF).

I like the relatedness function from vcftools:

vcftools --relatedness --vcf myVCF.vcf
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Jeremy Leipzig17k

Well, even KING in their tutorials use the same basic premise. But yes there is an issue there with number of reference only positions depending on the size of your cohort. However I would expect there to be enough data to get a close approximation if you say trios and sibs. I believe though that if you have a small family the A(jk) stat computed by --relatedness in vcftools isn't appropriate:

ADD REPLYlink written 3.5 years ago by Dan Gaston6.9k

The problem is less the method for kinship but rather the fact that the input samples are related.  Most kinship methods assume a larger sample containing many unrelated individuals. Thanks for your comments about the tutorial I wrote, but I don't follow your argument: you say that VCF is the issue (only variant sites), but the example you provide as an alternative itself uses VCF as input. As far as I understand the method implemented in VCFTools, it will suffer from the same issue.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Aaronquinlan10k

And I believe the relatedness argument to vcftools is only really appropriate populations and not when specifically checking for issues within a family. King basically suggests the same as your tutorial. King is supposed to work on small pedigrees, and it seems to do that reasonably well with the genotyping files that I have (but there seems to be an issue there as well). I may have to do something a little more complex to adjust for pedigrees.

ADD REPLYlink written 3.5 years ago by Dan Gaston6.9k

I suppose the problem being that in my experience the kinship coefficients derived from your recipe did not resemble the (.5/.25/.125/.0625) strata at all (I got much smaller coefficients, some of them negative), so you might as well use something that works directly on a VCF even if the coefficients derived do not follow some kind of guideline where you can directly declare a parental or sibling or cousin or unrelated relationship.

ADD REPLYlink written 3.5 years ago by Jeremy Leipzig17k
gravatar for SteveL
2.1 years ago by
SteveL30 wrote:

Just to update, the --relatedness2 option to VCFTOOLS works nicely for pedigrees with WES data. This implements exactly the same algorithm used in KING.

vcftools --gzvcf  YourZipped.vcf.gz  --relatedness2

First-degree relatives are ~0.25, and 2nd-degree ~0.125, and 3rd degree 0.0625.

"Unrelated" parents can reach values as high as ~0.04 in my experience.

Note that if you have large numbers of samples, from multiple pedigrees, identification of 3rd degree begins to become fuzzy, due to an overall slight inflation of relatedness - again this is likely due to the number of shared homozygous-reference positions.

ADD COMMENTlink written 2.1 years ago by SteveL30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 562 users visited in the last hour