Checking kinship coefficients and relationships and comparing genotyping data to exomes
2
8
Entering edit mode
9.6 years ago
DG 7.3k

I am wondering what everyones favourite tools are for checking kinship coefficients between individuals in family studies? I am doing this for QC on some projects where the results we were getting were not what we expected.

In this case I was using KING with plink formatted files that either came from genotyping data exported from Genome Studio or were generated from VCF files from exome sequencing data using vcftools (to convert from VCF to plink). In some cases the calculated kinship coefficients didn't match up, although some of that may be issues with filtering of variants in the VCF case.

What does everyone like to use for these tasks? Also do you have favourite tools for comparing SNPs from a genotyping experiment with your VCF files for quality control purposes?

pedigree SNP QC kinship vcf • 14k views
ADD COMMENT
0
Entering edit mode

Hi Dan, I'm trying to figure out what panel of SNPs KING interrogates to calculate kinship, but this doesn't seem to be explicitly stated anywhere. Do you know what this panel is?

ADD REPLY
1
Entering edit mode

I would assume it uses all of the SNPs in the genotype files you provide to it and not a selected subset or panel.

ADD REPLY
5
Entering edit mode
8.2 years ago
SteveL ▴ 90

Just to update, the --relatedness2 option to VCFTOOLS works nicely for pedigrees with WES data. This implements exactly the same algorithm used in KING.

vcftools --gzvcf YourZipped.vcf.gz --relatedness2

First-degree relatives are ~0.25, and 2nd-degree ~0.125, and 3rd degree 0.0625.

"Unrelated" parents can reach values as high as ~0.04 in my experience.

Note that if you have large numbers of samples, from multiple pedigrees, identification of 3rd degree begins to become fuzzy, due to an overall slight inflation of relatedness - again this is likely due to the number of shared homozygous-reference positions.

ADD COMMENT
0
Entering edit mode

Hi Steve, I'm trying to figure out what panel of SNPs KING interrogates to calculate kinship, but this doesn't seem to be explicitly stated anywhere. Do you know what this panel is?

ADD REPLY
0
Entering edit mode

Hi Gael,

Sorry for slow response but only just saw this. As far as I can tell it uses all SNPs you pass it. If you have a look in the paper, you can see where some of the limitations might be (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3025716/#!po=61.3636). I am generally using multi-sample VCFs as input, and it does generally work down to third degree relationships robustly (after removing any lines with no-calls, and applying a minimum depth-of-coverage filter).

It can also spot samples with an excess of heterozygosity, which tend to be cases where there is data from more than one sample in the VCF - i.e. most likely the DNA supplied to the lab was contaminated, or, more rarely, that there was an issue with the sequencing run.

ADD REPLY
3
Entering edit mode
9.6 years ago

Are you talking about that vcf kinship tutorial from Aaron Quinlan? Yeah, that doesn't work. I don't really know what he was thinking there since VCFs are usually variant-only so the number of reference calls seen is affected by the size of the cohort (the width of the VCF).

I like the relatedness function from vcftools:

vcftools --relatedness --vcf myVCF.vcf
ADD COMMENT
0
Entering edit mode

Well, even KING in their tutorials use the same basic premise. But yes there is an issue there with number of reference only positions depending on the size of your cohort. However I would expect there to be enough data to get a close approximation if you say trios and sibs. I believe though that if you have a small family the A(jk) stat computed by --relatedness in vcftools isn't appropriate:

http://sourceforge.net/p/vcftools/mailman/message/28988198/

ADD REPLY
0
Entering edit mode

The problem is less the method for kinship but rather the fact that the input samples are related. Most kinship methods assume a larger sample containing many unrelated individuals. Thanks for your comments about the tutorial I wrote, but I don't follow your argument: you say that VCF is the issue (only variant sites), but the example you provide as an alternative itself uses VCF as input. As far as I understand the method implemented in VCFTools, it will suffer from the same issue.

ADD REPLY
0
Entering edit mode

And I believe the relatedness argument to vcftools is only really appropriate populations and not when specifically checking for issues within a family. King basically suggests the same as your tutorial. King is supposed to work on small pedigrees, and it seems to do that reasonably well with the genotyping files that I have (but there seems to be an issue there as well). I may have to do something a little more complex to adjust for pedigrees.

ADD REPLY
0
Entering edit mode

I suppose the problem being that in my experience the kinship coefficients derived from your recipe did not resemble the (.5/.25/.125/.0625) strata at all (I got much smaller coefficients, some of them negative), so you might as well use something that works directly on a VCF even if the coefficients derived do not follow some kind of guideline where you can directly declare a parental or sibling or cousin or unrelated relationship.

ADD REPLY

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6