Question: calculate Per variant Heterozygosity from VCF file
1
gravatar for SOHAIL
3 months ago by
SOHAIL170
Beijing Institute of Genomics, CAS.
SOHAIL170 wrote:

Hi everybody,

Is there any way to calculate the per variant heterozygosity (i.e. number of 1/0 or 0/1 genotypes observed at given variant site for set of individuals in VCF file) from VCF file?

I knew per individual heterozygosity can be calculated by --het tag from VCFtools

Thanks!

-sohail

ngs vcf • 606 views
ADD COMMENTlink modified 3 months ago by Kevin Blighe17k • written 3 months ago by SOHAIL170
2
gravatar for Kevin Blighe
3 months ago by
Kevin Blighe17k
University College London Cancer Institute
Kevin Blighe17k wrote:

Assuming that you're confident that your VCF is normalised and that you genuinely just want to count occurrences of heterozygous calls per line, then the following will work for either phased (0|1 or 1|0) or unphased (0/1 or 1/0) genotypes, or a mixture of these.

Here, I'm actually accessing a BCF and it's chr1 from 1000 Genomes Phase III data. So, the genotypes are phased. For just un-phased, use gsub(/0\/1|1\/0/,""); for just phased, use gsub(/0\|1|1\|0/,""). the gsub function in AWK conveniently returns the number of matched patterns.

bcftools view chr1.1kg.phase3.v5.bcf | awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT\tHetCount"} !/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" gsub(/0\|1|1\|0|0\/1|1\/0/,"")}'
CHR POS     ID          REF                     ALT HetCount
1   10177   .           A                       AC  1490
1   10235   .           T                       TA  6
1   10352   rs145072688 T                       TA  2025
1   10616   rs376342519 CCGCCGTTGCAAAGGCGCGCCG  C   35
1   10642   .           G                       A   21
1   11008   .           C                       G   403
1   11012   .           C                       G   403
1   11063   .           T                       G   15
1   13110   .           G                       A   134
1   13116   rs201725126 T                       G   414
1   13118   rs200579949 A                       G   414
1   13273   .           G                       C   444
1   13284   .           G                       A   7
1   13380   .           C                       G   41
1   13483   .           G                       C   10
1   13494   .           A                       G   7
1   13550   .           G                       A   17
1   14464   .           A                       T   428
1   14599   .           T                       A   711
ADD COMMENTlink written 3 months ago by Kevin Blighe17k
1

It worked perfectly fine.. thanks for the help.. @Kevin

ADD REPLYlink written 3 months ago by SOHAIL170

You're welcome, Sohail

ADD REPLYlink written 3 months ago by Kevin Blighe17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 957 users visited in the last hour