Question: calculate Per variant Heterozygosity from VCF file
1
gravatar for SOHAIL
6 months ago by
SOHAIL200
Beijing Institute of Genomics, CAS.
SOHAIL200 wrote:

Hi everybody,

Is there any way to calculate the per variant heterozygosity (i.e. number of 1/0 or 0/1 genotypes observed at given variant site for set of individuals in VCF file) from VCF file?

I knew per individual heterozygosity can be calculated by --het tag from VCFtools

Thanks!

-sohail

ngs vcf • 999 views
ADD COMMENTlink modified 6 months ago by Kevin Blighe24k • written 6 months ago by SOHAIL200
2
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe24k
Republic of Ireland
Kevin Blighe24k wrote:

Assuming that you're confident that your VCF is normalised and that you genuinely just want to count occurrences of heterozygous calls per line, then the following will work for either phased (0|1 or 1|0) or unphased (0/1 or 1/0) genotypes, or a mixture of these.

Here, I'm actually accessing a BCF and it's chr1 from 1000 Genomes Phase III data. So, the genotypes are phased. For just un-phased, use gsub(/0\/1|1\/0/,""); for just phased, use gsub(/0\|1|1\|0/,""). the gsub function in AWK conveniently returns the number of matched patterns.

bcftools view chr1.1kg.phase3.v5.bcf | awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT\tHetCount"} !/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" gsub(/0\|1|1\|0|0\/1|1\/0/,"")}'
CHR POS     ID          REF                     ALT HetCount
1   10177   .           A                       AC  1490
1   10235   .           T                       TA  6
1   10352   rs145072688 T                       TA  2025
1   10616   rs376342519 CCGCCGTTGCAAAGGCGCGCCG  C   35
1   10642   .           G                       A   21
1   11008   .           C                       G   403
1   11012   .           C                       G   403
1   11063   .           T                       G   15
1   13110   .           G                       A   134
1   13116   rs201725126 T                       G   414
1   13118   rs200579949 A                       G   414
1   13273   .           G                       C   444
1   13284   .           G                       A   7
1   13380   .           C                       G   41
1   13483   .           G                       C   10
1   13494   .           A                       G   7
1   13550   .           G                       A   17
1   14464   .           A                       T   428
1   14599   .           T                       A   711
ADD COMMENTlink written 6 months ago by Kevin Blighe24k
1

It worked perfectly fine.. thanks for the help.. @Kevin

ADD REPLYlink written 6 months ago by SOHAIL200

You're welcome, Sohail

ADD REPLYlink written 6 months ago by Kevin Blighe24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 523 users visited in the last hour