Calculate mean_GQ value from individual GQ values in a multi-sample VCF file
1
0
Entering edit mode
7.1 years ago
aham ▴ 40
  1. I have a multi-sample VCF file (say comprising of 5 samples) created by GATK HaplotypeCaller. The 'FORMAT' field of each sample contains GT:AD:DP:GQ:PL values. Now I want to calculate mean GQ value for all the five samples, so that I may filter VCF file based on average/commulative GQ value.
    FORMAT field of the vcf file: GT:AD:DP:GQ:PL 1/1:0,21:21:63:736,63,0 0/0:3,0:3:9:0,9,84

  2. In concordance to first question, what is more suitable to filter vcf based on average GQ or commulative GQ?
    Thanks.

sequencing next-gen GATK VCF Variant-Calling • 3.5k views
ADD COMMENT
0
Entering edit mode
7.1 years ago
  1. I am confused here. Mean is the mean. You calculate average. That is it. You can filter by it say with vcftools and awk:

    vcftools --vcf input.vcf --extract-FORMAT-info GQ --out input.vcf

this creates file with chromosome, position and GQ columns for each sample in input.vcf.GQ.FORMAT file. Assume (double check and add special IDs) that chromosome with pos are unique within the file (this is not mandatory in vcf format), then you can use awk to filter by mean GQ or you can add a custom annotation to your vcf file using that table, say with vcftools' vcf-annotate, and filter later with more options.

(unfortunately, I do not know if vcftools/bcftools or other common tools allow for aggregate function for filtration on multisample vcf files' genotype info, this is why we use NoSQL database developed by us at ALAPY.com for storage and access)

  1. Depends on what are these 5 samples and why are you looking for a certain mutation. Also if these are within same library prep and sequencing run? For example, case of 5 technical replicates is very different say from trio analysis with tumor/normal tissues for proband...
ADD COMMENT
0
Entering edit mode

This is what I have been looking for. I applied it to my work and it went well. I have chrom, pos, and sample with their respective GQ values. How do I get the average across all the samples for each site (i.e chrom and pos). I need the average so that I can plot it in R.

ADD REPLY

Login before adding your answer.

Traffic: 3254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6