Question: Updating allele frequency (AF) and minor allele frequency (MAF) INFO fields in .vcf
gravatar for JSEM
9 months ago by
JSEM20 wrote:

Hi everyone,

I'm processing .vcf files for the first time, and am hoping for some advice with something. I recently filtered a set of .VCFs for a dataset to exclude around half of the subjects, but now want to update the INFO columns in me output files to reflect the stats of the new filtered dataset.

I've tried using fill-an-ac from vcftools, which has successfully updated the AN/AC fields, but this doesn't updated the allele frequency/minor allele frequency fields. I know that I could use the AN/AC values to derive AF/MAF, but I was just wondering if there is an option in one of the existing tools to update these fields automatically, similar to the function of fill-an-ac?

I'd really appreciate any suggestions you might have.

Thanks in advance!

snp vcftools qc imputation vcf • 541 views
ADD COMMENTlink written 9 months ago by JSEM20
gravatar for Pierre Lindenbaum
9 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

using vcffilterjdk:

use awk to insert the VCF headers AF and MAF if they're missing.

in vcffilterjdk: collect AN and AC for each alt allele. Map each AC to AF by dividing AC by AN. set the Attribute "AF", get the min (AF) and the set attribute "MAF".

$ gunzip -c  src/test/resources/rotavirus_rf.vcf.gz |\
awk '/^#CHROM/ {printf("##INFO=<ID=MAF,Number=1,Type=Float,Description=\"Min Allele Frequency\">\n##INFO=<ID=AF,Number=A,Type=Float,Description=\"Allele Frequency\">\n");} {print}' |\
java -jar dist/vcffilterjdk.jar -e 'VariantContextBuilder vcb = new VariantContextBuilder(variant); float ac = variant.getAttributeAsInt("AN",0); if(ac>0) { List<Float> af = variant.getAttributeAsIntList("AC",0).stream().map(N->N/ac).collect(Collectors.toList());vcb.attribute("AF",af);vcb.attribute("MAF",>X.floatValue()).min().orElse(-1.0) );} return vcb.make();'
ADD COMMENTlink written 9 months ago by Pierre Lindenbaum115k

Thanks for the quick reply. This approach seems like it should be quite straightforward, difficulty is that I'm running this on an HPC cluster, where running java programs is not very straightforward given permissions...etc and modules available on HPC.

Are there any other approaches that are non-java based, or workarounds for HPCs? Thanks again!

ADD REPLYlink written 9 months ago by JSEM20

, where running java programs is not very straightforward given permissions.

why java and not python or whatever ? This program use streaming = doesn't require much memory.

ADD REPLYlink written 9 months ago by Pierre Lindenbaum115k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1501 users visited in the last hour