Question: How To Use Maf Files In The Absolute Software
6
gravatar for Danielk
5.7 years ago by
Danielk570
Karolinska Institutet, Stockholm, Sweden
Danielk570 wrote:

I'm trying to run ABSOLUTE v 1.0.6 on my tumor exome seq + copy number data. The copy numbers are inferred from low-pass was and I generated somatic mutation calls using mutect and converted to MAF with @Cyriac Kandoths excellent converter. When I run ABSOLUTE using the CNV data only, it's all file and gives me nice output.

However, when I add the MAF file, it breaks saying

Error in CreateMutCnDat(maf, seg.dat, min.mut.af, verbose = verbose) :
Malformed MAF file, no ref column supplied

In the source code of ABSOLUTE I understand that I need columns for number of reads supporting the ref and alt alleles which makes perfect sense since it need to estimate frequencies and their variability.

My issue is that I don't have these columns. The MAF spec doesn't mention them (https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%...) and I unable to find any info on what kind of MAF files absolute wants.

So, two questions if anyone has insight:

  1. Is MAF still the way to go when using SNV data in ABSOLUTE? Is development ongoing using VCF?
  2. Which columns should be added to the MAF for compatibility with ABSOLUTE? I have VCF files with all info on counts etc, and the BAMs if I need further info.

cheers

I also posted at the Broad Cancer Help forum, but that seems to be a low-activity forum compared to biostars.

Daniel

ADD COMMENTlink modified 5.7 years ago by Cyriac Kandoth5.3k • written 5.7 years ago by Danielk570

Do you know where I can find documentation about how to run ABSOLUTE on exome-seq data? Maybe you could also comment on this thread: Anyone has a working example on how to run Broad's ABSOLUTE on exome sequencing data?

ADD REPLYlink written 5.6 years ago by Christian2.8k

I modified @Cyriacs script to add the required columns for ABSOLUTE. Note that I have yet to test run ABSOLUTE downstream, so this should be considered a first step. https://github.com/dakl/vcf2maf

ADD REPLYlink written 5.6 years ago by Danielk570

Hi Danielk, did you finally got the ABSOLUTE run?

I also used the MAF file together with the segmented file to feed, but only got the error reporting:

"Error: min.mut.af is required if a file is provided for maf.fn"

I actually set the min.mut.af value when using the maf.fn, not very clear where should be the problem.

ADD REPLYlink written 5.0 years ago by bioinflix60
5
gravatar for Cyriac Kandoth
5.7 years ago by
Cyriac Kandoth5.3k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.3k wrote:

Hi Daniel, you may already have downloaded ABSOLUTE v1.0.6 from here. The bits of code that die on Malformed MAF file are in Lines 135 to 161 of fit_somatic_muts.R. You'll notice that it looks for two non-standard MAF columns named t_ref_count and t_alt_count - allele counts for REF and ALT alleles, available in your VCFs. Just append them as two extra columns of your MAF, and the R code will find it. You can find a sample MAF down here.

Update: The vcf2maf tool can annotate a VCF into a MAF, while retaining allele counts in columns named t_ref_count and t_alt_count

ADD COMMENTlink modified 5.5 years ago • written 5.7 years ago by Cyriac Kandoth5.3k

Thanks! I'll give it a go.

From what I understand, this means that the MAF file ABSOLUTE needs is actually non-standard, even if I personally think that it makes perfect sense to include read counts. I think a comment on this on the ABSOLUTE website would help several users.

ADD REPLYlink written 5.7 years ago by Danielk570

Btw, do you know if VCF support is on the horizon?

ADD REPLYlink written 5.7 years ago by Danielk570
1

No, it's unlikely. At least in TCGA, and Broad's internal projects, final lists of somatic mutations are almost always in MAFs... mostly because tab-delimited spreadsheet-friendly formats are ideal for manual curation... which is unfortunately still necessary in somatic variant calling.

(rant) There are too many caveats and exceptions-to-the-rule when it comes to cancer... and no single somatic variant caller has successfully handled all these challenges. That's why the best approach is to use multiple somatic variant callers with complementary methods, union all their calls, and then weed out false-positives with automated or manual filtering strategies.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Cyriac Kandoth5.3k

Right now I'm running MuTect and Somatic Sniper, and union the results. Thinking about adding more - any input on which are relevant? #hugelistofcallersoutthere

ADD REPLYlink written 5.7 years ago by Danielk570

MuTect, Strelka, and VarScan2 are fairly complementary methods. The latter two can also find somatic indels. joint-snv-mix could be a good choice as a fourth complementary method, but I've never tested or benchmarked it. For more methods, see Best Software For Detection Of Somatic Mutations From Matched Tumor:Normal Ngs Data. After merging together all the resulting VCFs, I recommend this script for filtering out false-positives.

ADD REPLYlink modified 5.6 years ago • written 5.7 years ago by Cyriac Kandoth5.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2614 users visited in the last hour