Question: Calculate mean_GQ value from individual GQ values in a multi-sample VCF file
0
gravatar for aham
23 months ago by
aham40
aham40 wrote:
  1. I have a multi-sample VCF file (say comprising of 5 samples) created by GATK HaplotypeCaller. The 'FORMAT' field of each sample contains GT:AD:DP:GQ:PL values. Now I want to calculate mean GQ value for all the five samples, so that I may filter VCF file based on average/commulative GQ value.
    FORMAT field of the vcf file: GT:AD:DP:GQ:PL 1/1:0,21:21:63:736,63,0 0/0:3,0:3:9:0,9,84

  2. In concordance to first question, what is more suitable to filter vcf based on average GQ or commulative GQ?
    Thanks.

ADD COMMENTlink modified 23 months ago by Petr Ponomarenko2.6k • written 23 months ago by aham40
0
gravatar for Petr Ponomarenko
23 months ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.6k wrote:
  1. I am confused here. Mean is the mean. You calculate average. That is it. You can filter by it say with vcftools and awk:

    vcftools --vcf input.vcf --extract-FORMAT-info GQ --out input.vcf

this creates file with chromosome, position and GQ columns for each sample in input.vcf.GQ.FORMAT file. Assume (double check and add special IDs) that chromosome with pos are unique within the file (this is not mandatory in vcf format), then you can use awk to filter by mean GQ or you can add a custom annotation to your vcf file using that table, say with vcftools' vcf-annotate, and filter later with more options.

(unfortunately, I do not know if vcftools/bcftools or other common tools allow for aggregate function for filtration on multisample vcf files' genotype info, this is why we use NoSQL database developed by us at ALAPY.com for storage and access)

  1. Depends on what are these 5 samples and why are you looking for a certain mutation. Also if these are within same library prep and sequencing run? For example, case of 5 technical replicates is very different say from trio analysis with tumor/normal tissues for proband...
ADD COMMENTlink modified 23 months ago • written 23 months ago by Petr Ponomarenko2.6k

This is what I have been looking for. I applied it to my work and it went well. I have chrom, pos, and sample with their respective GQ values. How do I get the average across all the samples for each site (i.e chrom and pos). I need the average so that I can plot it in R.

ADD REPLYlink written 5 weeks ago by mab65820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1015 users visited in the last hour