Calculate mean DP4 in a multisample vcf
0
0
Entering edit mode
3 months ago
avelarbio46 ▴ 30

Hello everyone!

I'm trying to reduce the FORMAT in my vcf file by doing some summary statistics. To do this, I'm using:

MYVCF=my_multisample_vcf_path
paste <(bcftools view "$MYVCF" \|
awk -F"\t" 'BEGIN {print "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT"} !/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9}') <(bcftools query -f '[\t%SAMPLE=%GT]\n' "$MYVCF" \| 
awk 'BEGIN {OFS="\t"; print "nHomAlt\tnHet\tnHomRef"} {nHet=gsub(/0\|1|1\|0|0\/1|1\/0/, ""); nHomAlt=sub(/1\|1|1\/1/, ""); nHomRef=gsub(/0\|0|0\/0/, ""); print nHomAlt,nHet,nHomRef}')  \|
sed 's/,\t/\t/g' | sed 's/,$//g' >> out_put.vcf

This is generating 3 columns with the name of the samples that are Het, HomAlt and HomRef for each variant.

I want to do the same thing for DP4 , but instead of printing the names of samples, print the mean of all samples for each variant

##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="ref forward, ref reverse, alt forward, alt reverse">

Obviously, DP4 is a little more complex of a field then GT

Is there anyway to do this with AWK or any other tool?

So, basically, add 4 columns to VCF

DP4_ref_forward_mean            DP4_ref_reverse_mean            DP4_alt_forward_mean            DP4_alt_foward
dp4 vcf bcftools • 204 views
ADD COMMENT

Login before adding your answer.

Traffic: 2384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6