Question: VCF to MAF conversion for ABSOLUTE
2
gravatar for sgujja
3.7 years ago by
sgujja20
United States
sgujja20 wrote:

Hello,

I am writing to seek your help in using the vcf2maf conversion tool for Zebrafish reference genome. The output MAF file will be used as input for running ABSOLUTE.

I've cloned the site: https://github.com/dakl/vcf2maf.git

and installed VEP as per the instructions in the manual.

However, on running vcf2maf, I get an error:

perl vcf2maf.pl --input-vcf ../Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output-maf ../Sample_BC06_Sample_BC12_MuTect_filtered.maf --ref-fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --ncbi-build Zv9

STATUS: Running VEP and writing to: ../Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf

UNIVERSAL->import is deprecated and will be removed in a future perl at /home/sg15w/vep/Bio/Tree/TreeFunctionsI.pm line 94.

ERROR: Cache assembly version (GRCh38) and database or selected assembly version (Zv9) do not match

 

If using human GRCh37 add "--port 3337" to use the GRCh37 database, or --offline to avoid database connection entirely

 

ERROR: Failed to run the VEP annotator!

Command: /share/pkg/perl/5.18.1/bin/perl /home/sg15w/vep/variant_effect_predictor.pl --quiet --offline --no_stats --everything --check_existing --total_length --allele_number --no_escape --gencode_basic --xref_refseq --assembly Zv9 --dir /home/sg15w/.vep --fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --vcf --input_file ../Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output_file ../Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf

 

I've tried running just he VEP annotator with --species zebrafish option and it works. 

I need your help in understanding which option to use to fix the cache error.

Also, do I need to add any other options for the output MAF to be compatible with running ABSOLUTE.

Thanks for all the help.

Sharvari

vcf2maf absolute • 2.3k views
ADD COMMENTlink modified 3.6 years ago by shlee60 • written 3.7 years ago by sgujja20

We have already had discussions on this via Ensembl helpdesk. I suspect, though I don't know for certain, that VCF2MAF is specific to human. If anyone could shed some light on this, that would be great.

Update, I'm now fairly sure it's just for human:

my ( $vep_path, $vep_data, $vep_forks, $ref_fasta ) = ( "$ENV{HOME}/vep", "$ENV{HOME}/.vep", 1, "$ENV{HOME}/.vep/homo_sapiens/78_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa" );
ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Emily_Ensembl17k

The code above are the defaults (human) but it's settable to whatever. Did @sgujja download the Zv9 VEP annotations? 

FYI, the repo you cloned is a clone I made a while back, but haven't updated in a while. Use Cyriacs repo for the latest version: https://github.com/ckandoth/vcf2maf

Maybe @ckandoth has input as well?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Danielk560

Hello Daniel,

Thanks for the reply.

I did download Zv9 VEP annotation:

/home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa

and for snpeff as well:

java -Xmx2g -jar snpEff.jar download -dataDir $SNPEFF_DATA Zv9

On cloning Cyriacs repo

perl vcf2maf.pl --input-vcf /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output-maf Sample_BC06_Sample_BC12_MuTect_filtered.maf --ref-fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --ncbi-build Zv9

STATUS: Running VEP and writing to: /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf

UNIVERSAL->import is deprecated and will be removed in a future perl at /home/sg15w/vep/Bio/Tree/TreeFunctionsI.pm line 94.

Unknown option: shift_hgvs

ERROR: Failed to parse command-line flags

 

ERROR: Failed to run the VEP annotator!

Command: /share/pkg/perl/5.18.1/bin/perl /home/sg15w/vep/variant_effect_predictor.pl --quiet --offline --no_stats --everything --shift_hgvs --check_existing --total_length --allele_number --no_escape --gencode_basic --xref_refseq --assembly Zv9 --dir /home/sg15w/.vep --fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --vcf --input_file /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output_file /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf

I think the tool is still using human reference. How do I overwrite it?

Thanks

ADD REPLYlink written 3.7 years ago by sgujja20
1

Hi, 

Cyriac updated vcf2maf, and the latest version should support both these issues. 

https://github.com/ckandoth/vcf2maf/releases/tag/v1.5.3

ADD REPLYlink written 3.7 years ago by Danielk560
2
gravatar for shlee
3.6 years ago by
shlee60
United States
shlee60 wrote:

For running ABSOLUTE algorithm v1.0.6 on GenePattern, somatic mutations are supplied in a MAF format plain-text file per sample. The concept of formats may lead one to think of conversion tools but you can instead think of a required format as the (closest) approximation that provides the required information an algorithm needs to run. Algorithms require data parsed in particular ways, e.g. tab-delimited, contain specific information that it recognizes by column/row labels, e.g. a column header labeled "Chromosome", and may require a specific file extension, e.g. .txt or .maf.

Here I am regurgitating some of my words used in updating GenePattern site documentation:

For GenePattern's ABSOLUTE algorithm v1.0.6, file extension does not matter and hashtagged header rows (#) may be present within the MAF. ABSOLUTE algorithm v1.0.6 requires the following seven columns. Additional columns may be present.

  • t_ref_count OR i_t_ref_count 
    • Count of reference alleles in tumor.
  • t_alt_count OR i_t_alt_count 
    • Count of alternate alleles in tumor. Together with t_ref_count adds up to the depth of reads in the tumor BAM alignment. You can calculate a missing value if two of these three values are known or with read depth and the frequency of the alternate allele within the sample. These and other MuTect output columns are described further in the GATK forum.
  • dbSNP_Val_Status
    • Fields may be blank and multiple values are separated with nonspaced semicolon. Example values include bySubmitter, by1000genomes, by2Hit2Allele, and byHapMap.
  • Start_position 
    • Note the lowercase "p". Also, note that the End_position column is not required. This implies that ABSOLUTE algorithm v1.0.6 treats all mutation data equally as point mutations, the expected type of mutation data.
  • Tumor_Sample_Barcode
    • Fields may be blank.
  • Hugo_Symbol
    • Fields may be blank or "unknown". 
  • Chromosome
    • Must be in # format and not chr# format. The # value must correspond to that in the segmented copy ratios data file identically. For example, ABSOLUTE does not equate X with 23 and will exclude these mutations as unmapped mutations. Note ABSOLUTE algorithm v1.0.6 excludes X chromosome data but not numbered chromosome, e.g. chr23, data.
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by shlee60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour