Question: Vcf To Maf (Mutation Annotation Format) Conversion ?
4
gravatar for Kasthuri
8.1 years ago by
Kasthuri260
United States
Kasthuri260 wrote:

Is there any standard tool out there that can convert a VCF file to Mutation Annotation Format (MAF)?

Thanks -Kasthuri

vcf maf • 13k views
ADD COMMENTlink modified 6.3 years ago by Cyriac Kandoth5.5k • written 8.1 years ago by Kasthuri260

I have snpeff annotated vcf files and I am converting these to maf format. When I run vcf2maf i get the rerror

ERROR: Unrecognized effect "DOWNSTREAM". Please update your hashes!
 

Can you please point out the reason for this error.

ADD REPLYlink written 5.1 years ago by viji0

Please open a new question, and use tags and keywords like vcf, maf, vcf2maf... so the relevant folks can find it.

ADD REPLYlink written 5.1 years ago by Cyriac Kandoth5.5k
4
gravatar for Cyriac Kandoth
6.3 years ago by
Cyriac Kandoth5.5k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.5k wrote:

I recently posted a VCF->MAF conversion script at: github.com/ckandoth/vcf2maf. It's plenty documented so that you understand what information is lost in translation.

Briefly - each VCF variant must be annotated to only one of all possible gene transcripts/isoforms that it might affect. This selection of a single affected transcript/isoform per variant, is often subjective. For now, the script tries to follow best-practices: it chooses the "worst" effect on the "best" transcript. If there are multiple such candidates, it annotates the variant effect on the canonical "best" transcript.

ADD COMMENTlink modified 5.5 years ago • written 6.3 years ago by Cyriac Kandoth5.5k

That's a great tool, thanks! I added a command line parameter for the name of snpeff vcf, feel free to use it if interested. (https://github.com/dakl/vcf2maf)

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Danielk590
1

@Cyriac Actually, I also removed the snpeff step completely, requiring that the user runs it separately upstream of vcf2maf. I think that makes more sense, so that vcf2maf.pl is a pure converter of a pre-annotated file. Whaddayathink?

ADD REPLYlink written 6.1 years ago by Danielk590
1

Yea that makes sense - to give the user the option to run snpEff themselves. Actually, the first version of this script was a "converter of a pre-annotated VCF" :) Then I wanted to package it all-in-one.

Update: I released vcf2maf v1.1 that allows you to use a VCF that is already annotated with snpEff or Ensembl's VEP.

ADD REPLYlink modified 6.0 years ago • written 6.1 years ago by Cyriac Kandoth5.5k

FYI, I recently started getting ERROR: Unrecognized biotype "non_coding". Please update your hashes! at vcf2maf.pl line 287, <GEN0> line 171..I added it with priority 3 which had other non-coding RNAs in it. Just so you know.

Info on the biotype from here: http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html

ADD REPLYlink written 6.0 years ago by Danielk590
1

Thanks. Which transcript database are you using? I don't see non_coding as a valid transcript biotype in the Ensembl 74 GTF, but I do see it listed in the GENCODE specs. I have now updated the script to handle all the GENCODE biotypes.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Cyriac Kandoth5.5k

I've been using 73 so it's likely that's changing between versions, great to handle them all. What's the rationale in prioritizing the biotypes?

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Danielk590

If a variant locus maps to multiple genes/transcripts, which biotype is most well defined and/or more likely to be disease associated.

ADD REPLYlink written 6.0 years ago by Cyriac Kandoth5.5k

This is a great tool but the current version still requires snpEff yet I have already annotated using snpEff. Could you please provide ASAP a version that doesn't require snpEff? Thanks!

ADD REPLYlink written 6.0 years ago by tayebwajb90
2

Please see fork of the code mentioned above by @Danielk. Alternatively, my script skips snpEff annotation for an input VCF named file.vcf if it finds an annotated VCF in the same folder named file.anno.vcf.

Update: I released vcf2maf v1.1 that allows you to use a VCF that is already annotated with snpEff or Ensembl's VEP.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Cyriac Kandoth5.5k

To convert to MAF, you'll always have to annotate the variants with snpEff, no matter if it's done in the script as in Cyriacs version, or upstream as in my version. There's no way around that.

ADD REPLYlink written 6.0 years ago by Danielk590
1
gravatar for Sean Davis
8.1 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

MAF contains annotation about the variant effects on transcripts/proteins while VCF typically does not. You might find that using tools like annovar, snpeff, and the Ensembl Variant Effect Predictor get you pretty close. I'm not aware of a script that applies one or more of the tools to a VCF file to produce MAF directly.

ADD COMMENTlink written 8.1 years ago by Sean Davis26k
1

I should comment here that MAF is not really considered a "standard" format, so you may want to make sure that the output of one of the software packages mentioned above would not suffice for your final purpose.

ADD REPLYlink written 8.1 years ago by Sean Davis26k
1

FWIW, MAF is a "standard" format within the TCGA project. Here's documentation: https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+(MAF)+Specification

ADD REPLYlink written 6.7 years ago by Chris Miller21k

Thanks! I tried annovar and snpeff and although they are close, they don't really help. Looks like I need to write my own script!

-K.

ADD REPLYlink written 8.1 years ago by Kasthuri260

But you'll probably still need to run annovar or snpeff or something like that (unless you are into reinventing wheels). The output of annovar or snpeff is what gets fed to your script is what I would envision.

ADD REPLYlink written 8.1 years ago by Sean Davis26k

Thanks Sean. The problem started when I wanted to use MuSiC. This requires the mutations in MAF format and I have a bunch of vcfs. You are right, that I first need to extract information from the vcf through annovar/snpeff.

ADD REPLYlink modified 4 months ago by RamRS25k • written 8.1 years ago by Kasthuri260

Thanks Sean. The problem started when I wanted to use MuSiC gmt.genome.wustl.edu/genome-music/0.2/index.html) This requires the mutations in MAF format and I have a bunch of vcfs. You are right, that I first need to extract information from the vcf through annovar/snpeff.

ADD REPLYlink written 8.1 years ago by Kasthuri260

The MAF format specifically asks "Tumor_Seq_Allele2" in Column 13. And I am wondering how do I can find that information in the vcf file? Thanks.

ADD REPLYlink written 8.1 years ago by Kasthuri260

If there are two variant alleles, then you will find that in the ALT column of the VCF file as a comma-separated value. In most cases, there will not be a second variant allele present, I do not think.

ADD REPLYlink written 8.1 years ago by Sean Davis26k
1
gravatar for mdm-two
6.5 years ago by
mdm-two230
mdm-two230 wrote:

See www.biostars.org/p/74822/ and seqanswers.com/forums/showthread.php?t=16740

ADD COMMENTlink written 6.5 years ago by mdm-two230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1451 users visited in the last hour