Question: Checking kinship coefficients and relationships and comparing genotyping data to exomes
7
gravatar for Dan Gaston
4.3 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

I am wondering what everyones favourite tools are for checking kinship coefficients between individuals in family studies? I am doing this for QC on some projects where the results we were getting were not what we expected. 

 

In this case I was using KING with plink formatted files that either came from genotyping data exported from Genome Studio or were generated from VCF files from exome sequencing data using vcftools (to convert from VCF to plink). In some cases the calculated kinship coefficients didn't match up, although some of that may be issues with filtering of variants in the VCF case.

What does everyone like to use for these tasks? Also do you have favourite tools for comparing SNPs from a genotyping experiment with your VCF files for quality control purposes?

 

 

 

snp qc kinship pedigree vcf • 7.4k views
ADD COMMENTlink modified 2.9 years ago by SteveL50 • written 4.3 years ago by Dan Gaston7.1k

Hi Dan, I'm trying to figure out what panel of SNPs KING interrogates to calculate kinship, but this doesn't seem to be explicitly stated anywhere. Do you know what this panel is?

ADD REPLYlink written 7 months ago by gaelgarcia100
1

I would assume it uses all of the SNPs in the genotype files you provide to it and not a selected subset or panel.

ADD REPLYlink written 7 months ago by Dan Gaston7.1k
2
gravatar for Jeremy Leipzig
4.3 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Are you talking about that vcf kinship tutorial from Aaron Quinlan? Yeah, that doesn't work. I don't really know what he was thinking there since VCFs are usually variant-only so the number of reference calls seen is affected by the size of the cohort (the width of the VCF).

I like the relatedness function from vcftools:

vcftools --relatedness --vcf myVCF.vcf
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Jeremy Leipzig18k

Well, even KING in their tutorials use the same basic premise. But yes there is an issue there with number of reference only positions depending on the size of your cohort. However I would expect there to be enough data to get a close approximation if you say trios and sibs. I believe though that if you have a small family the A(jk) stat computed by --relatedness in vcftools isn't appropriate:

 

http://sourceforge.net/p/vcftools/mailman/message/28988198/

ADD REPLYlink written 4.3 years ago by Dan Gaston7.1k

The problem is less the method for kinship but rather the fact that the input samples are related.  Most kinship methods assume a larger sample containing many unrelated individuals. Thanks for your comments about the tutorial I wrote, but I don't follow your argument: you say that VCF is the issue (only variant sites), but the example you provide as an alternative itself uses VCF as input. As far as I understand the method implemented in VCFTools, it will suffer from the same issue.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Aaronquinlan10k

And I believe the relatedness argument to vcftools is only really appropriate populations and not when specifically checking for issues within a family. King basically suggests the same as your tutorial. King is supposed to work on small pedigrees, and it seems to do that reasonably well with the genotyping files that I have (but there seems to be an issue there as well). I may have to do something a little more complex to adjust for pedigrees.

ADD REPLYlink written 4.3 years ago by Dan Gaston7.1k

I suppose the problem being that in my experience the kinship coefficients derived from your recipe did not resemble the (.5/.25/.125/.0625) strata at all (I got much smaller coefficients, some of them negative), so you might as well use something that works directly on a VCF even if the coefficients derived do not follow some kind of guideline where you can directly declare a parental or sibling or cousin or unrelated relationship.

ADD REPLYlink written 4.3 years ago by Jeremy Leipzig18k
2
gravatar for SteveL
2.9 years ago by
SteveL50
BCN
SteveL50 wrote:

Just to update, the --relatedness2 option to VCFTOOLS works nicely for pedigrees with WES data. This implements exactly the same algorithm used in KING.

vcftools --gzvcf  YourZipped.vcf.gz  --relatedness2

First-degree relatives are ~0.25, and 2nd-degree ~0.125, and 3rd degree 0.0625.

"Unrelated" parents can reach values as high as ~0.04 in my experience.

Note that if you have large numbers of samples, from multiple pedigrees, identification of 3rd degree begins to become fuzzy, due to an overall slight inflation of relatedness - again this is likely due to the number of shared homozygous-reference positions.

ADD COMMENTlink written 2.9 years ago by SteveL50

Hi Steve, I'm trying to figure out what panel of SNPs KING interrogates to calculate kinship, but this doesn't seem to be explicitly stated anywhere. Do you know what this panel is?

ADD REPLYlink written 7 months ago by gaelgarcia100

Hi Gael,

Sorry for slow response but only just saw this. As far as I can tell it uses all SNPs you pass it. If you have a look in the paper, you can see where some of the limitations might be (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3025716/#!po=61.3636). I am generally using multi-sample VCFs as input, and it does generally work down to third degree relationships robustly (after removing any lines with no-calls, and applying a minimum depth-of-coverage filter).

It can also spot samples with an excess of heterozygosity, which tend to be cases where there is data from more than one sample in the VCF - i.e. most likely the DNA supplied to the lab was contaminated, or, more rarely, that there was an issue with the sequencing run.

ADD REPLYlink written 5 months ago by SteveL50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 998 users visited in the last hour