Question

How to convert maf to vcf while retaining original mutation notations?

1

Entering edit mode

3.5 years ago

steve ★ 3.5k

vcf2maf is the standard tool for converting maf and vcf files back and forth. However, I am encountering situations where the mutation notation between maf and vcf formats are not the same.

For example, I have a mutation in maf format that looks like this:

15      66729115        66729129        +       In_Frame_Del    DEL     GGAACCAGATCATAA      -

After converting to vcf format, the mutation is now written like this:

15       66729114        .       CGGAACCAGATCATAA        C

The POS chromosome position value has been changed, along with the ref and alt alleles, to match the requirements of the vcf spec.

What I need is a method for converting maf <-> vcf in such a way that I can retain the original coordinates in the output file. Thus, I want to have an output file that lists something like this, where both the old and new mutation notations are recorded;

15   66729114   CGGAACCAGATCATAA    C   15   66729115    66729129   GGAACCAGATCATAA      -

For context, I have maf files with extra metadata associated for each variant (which isnt preserved when converting to vcf), and I have converted them to vcf format in order to get more required metadata (in this case, output from bcftools isec which only takes vcf input and outputs a sites.txt file with presence/absence of each mutation in each sample). Upon trying a simple merge between the two datasets based on the Chrom, Pos, Ref, Alt values in each, I find that I am not able to correctly merge these entries where the mutation notation is change during the conversion. So I need an output file that has both notations recorded to assist in backfilling the new metadata from the vcf analysis into the correct maf variant entries.

vcf maf • 6.0k views

ADD COMMENT • link updated 15 months ago by Ram 44k • written 3.5 years ago by steve ★ 3.5k

1

Entering edit mode

3.5 years ago

Jorge Amigo 14k

The ideal solution would be that the conversion software that you're using would keep that information in the VCF's INFO column for instance. Check first if that option is available, since any other solution would imply manual intervention, therefore mistakes could be made making going back to the exact original version impossible.

Regarding MAF to VCF conversion only, if the original format and the VCF are needed in the same file I can suggest a viable option: you could try using the convert2annovar.pl script from ANNOVAR and run it on your converted VCF file, as it will create a tabulated file where the first 5 columns are the last 5 columns of your example, and the other columns can be the VCF ones if the --includeinfo option is used.

ADD COMMENT • link 3.5 years ago by Jorge Amigo 14k

0

Entering edit mode

ANNOVAR could be a good solution for this, however the licensing requirements around it tend to keep me from using it as much these days.

ADD REPLY • link 3.5 years ago by steve ★ 3.5k

0

Entering edit mode

ANNOVAR is just a suggestion of a software that would read VCF format and translate it into the one you need while retaining the information in the VCF format. I was just pointing out the rationale you'd have to go for, but I'm sure you may find other programs that would work out such conversion.

ADD REPLY • link 3.5 years ago by Jorge Amigo 14k

score 3 · Accepted Answer · 2021-05-07

vcf2maf adds a MAF column named vcf_pos to preserve the original value in the POS field of a VCF. Similarly preserved are ID, QUAL, and FILTER as documented here. maf2vcf similarly restores these fields into a VCF. If this is the extent of metadata you want preserved between VCFs and MAFs, then you're good to go.

vcf2maf has options --retain-info, --retain-fmt, and --retain-ann to create MAF columns for values in the VCF INFO, FORMAT, and ANN fields respectively. But there is no equivalent --retain-cols feature in maf2vcf, that would create INFO fields containing MAF data. I have not seen situations where this is needed, that couldn't be solved in other ways.

In all honesty, the MAF format should be obsoleted - and secondary analysis pipelines should only use VCFs in all steps until it is time to report events in cBioPortal or a tab-delimited format for R-users. maf2vcf should not exist. :)