I'm wondering if anyone has written code, or knows of BCFTools/VCFTools/other published tool, to trim gnomAD v3.1.2 down from its current monstrosity (~3Tb for the all chromosomes) into a smaller utility for purposes of variant filtering in rare disease variant annotation.
Essentially I want to only keep a handful of annotations to slim this down, focusing on:
n_hom_alt_global
AF_global
AC_global
The ideal solution would let me specify which annotations to keep (maybe as a list), and which to remove.
What about a simple bcftools query? Unfortunately, you'll need a workaround to manage fields. For INFO fields, a simple -f <(cat fields.list | tr "\n" "\t") might work. This could be another option, although I've never tried it: https://vcf-kit.readthedocs.io/en/latest/vcf2tsv/
wget -O - "https://storage.googleapis.com/gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr22.vcf.bgz" |\
bcftools annotate -x '^INFO/nhomalt_popmax,INFO/AF,INFO/AC,FILTER,ID,'
(...)
chr22 10518128 . G A . . AC=0;AF=0
chr22 10518128 . G T . . AC=0;AF=0
chr22 10518144 . A G . . AC=0;AF=0
chr22 10518147 . G T . . AC=0;AF=0
chr22 10518152 . G T . . AC=1;AF=4.8216e-05;nhomalt_popmax=0
chr22 10518153 . G C . . AC=1;AF=4.85484e-05;nhomalt_popmax=0
chr22 10518174 . G A . . AC=0;AF=0
chr22 10518183 . G A . . AC=3;AF=0.000185117;nhomalt_popmax=1
chr22 10518211 . G A . . AC=1;AF=6.71231e-05;nhomalt_popmax=0
chr22 10518213 . G A . . AC=0;AF=0
chr22 10518213 . G GA . . AC=37;AF=0.00246831;nhomalt_popmax=12
chr22 10518213 . GA G . . AC=4;AF=0.00026688;nhomalt_popmax=0
chr22 10518250 . G A . . AC=2;AF=0.000121551;nhomalt_popmax=1
chr22 10518251 . C T . . AC=11;AF=0.000672125;nhomalt_popmax=5
chr22 10518275 . T A . . AC=0;AF=0
chr22 10518278 . G A . . AC=1;AF=5.97514e-05;nhomalt_popmax=0
chr22 10518282 . A G . . AC=2;AF=0.000118301;nhomalt_popmax=1
What about a simple
bcftools query
? Unfortunately, you'll need a workaround to manage fields. For INFO fields, a simple-f <(cat fields.list | tr "\n" "\t")
might work. This could be another option, although I've never tried it: https://vcf-kit.readthedocs.io/en/latest/vcf2tsv/