kinship coefficient negative in multi-sample VCF from exome sequencing
0
0
Entering edit mode
12 months ago
AMARU • 0

Hi All,

I hope you can guide me a hand here.

I have a SNP multi-sample vcf file (n=259 people) from target exome sequencing with ~20x coverage; this file has been processed with GATK by a big Center so I fairly trusted their work. This multi-sample vcf file contains ~70 close relatives (mostly siblings,1st cousins, parents), so I expect king to estimate the relatedness accurately. Also, these people come from a fairly homogenous (and isolated) population so relatedness should be high.

I have further processed this vcf file using the following commands:

1: normalize:

bcftools norm -m-any x.vcf -Ov > Norm.vcf

2: left align:

bcftools norm -f genome.fa -o Norm.Aligned.vcf Norm.vcf

Then in plink/1.9:

plink --vcf Norm.Aligned.vcf --make-bed --out binary --allow-no-sex

Then in king:

./king -b binary.bed --kinship

Output:

Between-family kinship data saved in file king.kin0

Note --kinship --degree <n> can filter & speed up the kinship computing.

X-chromosome analysis... X-chromosome genotypes stored in 777 64-bit words for each of 259 individuals. Within-family kinship data saved in file kingX.kin Relationship inference across families starts at Thu Apr 13 18:08:43 2023 ends at Thu Apr 13 18:08:43 2023 Between-family kinship data saved in file kingX.kin0 KING ends at Thu Apr 13 18:08:43 2023

enter image description here

This is what I obtained with using --related

enter image description here

I have also repeated the same processing without left-aligning (just normalizing), and with/without Plink2. I always obtained the same result.

Any thoughts on what I am missing?

Edited

The file contains 2.9 mill SNPs, and I have run quantitative traits associations with these data, that have been replicated by other folk. So, I may be vcf --> plink incorrectly or missing something else.

SNPs KING sequencing relatedness exome • 807 views
ADD COMMENT
1
Entering edit mode

One thing is that the relationships should be apparent on an MDS plot; have you taken a look? That should tell you if vcf -> plink is broken.

Typically this is not applied on WGS data but instead on microarray data where sites are known to be polymorphic, which means that rare/private variants are for the most part excluded. I don't know if KING has a filter on frequency; but it is possible that private variants may be driving this.

What happens if you subset to those variants with a MAF of say 10% or higher (0.1 < freq < 0.9) in this cohort?

ADD REPLY
0
Entering edit mode

Thanks for answering. The VCF file seems ok. It seems KING is not adequate for WES studies where SNPs are not called across most of the samples.

I tried KING on a GWAS dataset from the same samples and it worked fine.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6