vcf2maf is the standard tool for converting maf and vcf files back and forth. However, I am encountering situations where the mutation notation between maf and vcf formats are not the same.
For example, I have a mutation in maf format that looks like this:
15 66729115 66729129 + In_Frame_Del DEL GGAACCAGATCATAA -
After converting to vcf format, the mutation is now written like this:
15 66729114 . CGGAACCAGATCATAA C
The POS
chromosome position value has been changed, along with the ref and alt alleles, to match the requirements of the vcf spec.
What I need is a method for converting maf <-> vcf in such a way that I can retain the original coordinates in the output file. Thus, I want to have an output file that lists something like this, where both the old and new mutation notations are recorded;
15 66729114 CGGAACCAGATCATAA C 15 66729115 66729129 GGAACCAGATCATAA -
For context, I have maf files with extra metadata associated for each variant (which isnt preserved when converting to vcf), and I have converted them to vcf format in order to get more required metadata (in this case, output from bcftools isec
which only takes vcf input and outputs a sites.txt
file with presence/absence of each mutation in each sample). Upon trying a simple merge between the two datasets based on the Chrom, Pos, Ref, Alt values in each, I find that I am not able to correctly merge these entries where the mutation notation is change during the conversion. So I need an output file that has both notations recorded to assist in backfilling the new metadata from the vcf analysis into the correct maf variant entries.
as a follow up, this was essentially the solution I settled on. I ended up using
maf2vcf.pl
to convert my maf files back into vcf format, then used a large about ofbcftools
commands to do all the parsing needed, before finally converting back into the require maf output format.