VCF to MAF conversion for ABSOLUTE
1
4
Entering edit mode
8.9 years ago
sgujja ▴ 40

Hello,

I am writing to seek your help in using the vcf2maf conversion tool for Zebrafish reference genome. The output MAF file will be used as input for running ABSOLUTE.

I've cloned the site: https://github.com/dakl/vcf2maf.git

and installed VEP as per the instructions in the manual.

However, on running vcf2maf, I get an error:

perl vcf2maf.pl --input-vcf ../Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output-maf ../Sample_BC06_Sample_BC12_MuTect_filtered.maf --ref-fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --ncbi-build Zv9
STATUS: Running VEP and writing to: ../Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf
UNIVERSAL->import is deprecated and will be removed in a future perl at /home/sg15w/vep/Bio/Tree/TreeFunctionsI.pm line 94.
ERROR: Cache assembly version (GRCh38) and database or selected assembly version (Zv9) do not match
If using human GRCh37 add "--port 3337" to use the GRCh37 database, or --offline to avoid database connection entirely
ERROR: Failed to run the VEP annotator!
Command: /share/pkg/perl/5.18.1/bin/perl /home/sg15w/vep/variant_effect_predictor.pl --quiet --offline --no_stats --everything --check_existing --total_length --allele_number --no_escape --gencode_basic --xref_refseq --assembly Zv9 --dir /home/sg15w/.vep --fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --vcf --input_file ../Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output_file ../Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf

I've tried running just he VEP annotator with --species zebrafish option and it works.

I need your help in understanding which option to use to fix the cache error.

Also, do I need to add any other options for the output MAF to be compatible with running ABSOLUTE.

Thanks for all the help.

Sharvari

ABSOLUTE vcf2maf • 4.8k views
ADD COMMENT
0
Entering edit mode

We have already had discussions on this via Ensembl helpdesk. I suspect, though I don't know for certain, that VCF2MAF is specific to human. If anyone could shed some light on this, that would be great.

Update, I'm now fairly sure it's just for human:

my ( $vep_path, $vep_data, $vep_forks, $ref_fasta ) = ( "$ENV{HOME}/vep", "$ENV{HOME}/.vep", 1, "$ENV{HOME}/.vep/homo_sapiens/78_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa" );
ADD REPLY
0
Entering edit mode

The code above are the defaults (human) but it's settable to whatever. Did @sgujja download the Zv9 VEP annotations?

FYI, the repo you cloned is a clone I made a while back, but haven't updated in a while. Use Cyriacs repo for the latest version: https://github.com/ckandoth/vcf2maf

Maybe Cyriac Kandoth has input as well?

ADD REPLY
0
Entering edit mode

Hello Daniel,

Thanks for the reply.

I did download Zv9 VEP annotation:

/home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa

and for snpeff as well:

java -Xmx2g -jar snpEff.jar download -dataDir $SNPEFF_DATA Zv9

On cloning Cyriacs repo

perl vcf2maf.pl --input-vcf /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output-maf Sample_BC06_Sample_BC12_MuTect_filtered.maf --ref-fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --ncbi-build Zv9
STATUS: Running VEP and writing to: /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf
UNIVERSAL->import is deprecated and will be removed in a future perl at /home/sg15w/vep/Bio/Tree/TreeFunctionsI.pm line 94.
Unknown option: shift_hgvs
ERROR: Failed to parse command-line flags
ERROR: Failed to run the VEP annotator!
Command: /share/pkg/perl/5.18.1/bin/perl /home/sg15w/vep/variant_effect_predictor.pl --quiet --offline --no_stats --everything --shift_hgvs --check_existing --total_length --allele_number --no_escape --gencode_basic --xref_refseq --assembly Zv9 --dir /home/sg15w/.vep --fasta /home/sg15w/.vep/danio_rerio/76_Zv9/Danio_rerio.Zv9.dna.toplevel.fa --vcf --input_file /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vcf --output_file /project/umw_michael_czech/BIOIFX-032/analysis/analysis_Absolute/Sample_BC06_Sample_BC12_MuTect_filtered.vep.vcf

I think the tool is still using human reference. How do I overwrite it?

Thanks

ADD REPLY
1
Entering edit mode

Hi,

Cyriac updated vcf2maf, and the latest version should support both these issues.

https://github.com/ckandoth/vcf2maf/releases/tag/v1.5.3

ADD REPLY
4
Entering edit mode
8.8 years ago
shlee ▴ 80

For running ABSOLUTE algorithm v1.0.6 on GenePattern, somatic mutations are supplied in a MAF format plain-text file per sample. The concept of formats may lead one to think of conversion tools but you can instead think of a required format as the (closest) approximation that provides the required information an algorithm needs to run. Algorithms require data parsed in particular ways, e.g. tab-delimited, contain specific information that it recognizes by column/row labels, e.g. a column header labeled "Chromosome", and may require a specific file extension, e.g. .txt or .maf.

Here I am regurgitating some of my words used in updating GenePattern site documentation:

For GenePattern's ABSOLUTE algorithm v1.0.6, file extension does not matter and hashtagged header rows (#) may be present within the MAF. ABSOLUTE algorithm v1.0.6 requires the following seven columns. Additional columns may be present.

  • t_ref_count OR i_t_ref_count
    • Count of reference alleles in tumor.
  • t_alt_count OR i_t_alt_count
    • Count of alternate alleles in tumor. Together with t_ref_count adds up to the depth of reads in the tumor BAM alignment. You can calculate a missing value if two of these three values are known or with read depth and the frequency of the alternate allele within the sample. These and other MuTect output columns are described further in the GATK forum.
  • dbSNP_Val_Status
    • Fields may be blank and multiple values are separated with nonspaced semicolon. Example values include bySubmitter, by1000genomes, by2Hit2Allele, and byHapMap.
  • Start_position
    • Note the lowercase "p". Also, note that the End_position column is not required. This implies that ABSOLUTE algorithm v1.0.6 treats all mutation data equally as point mutations, the expected type of mutation data.
  • Tumor_Sample_Barcode
    • Fields may be blank.
  • Hugo_Symbol
    • Fields may be blank or "unknown".
  • Chromosome
    • Must be in # format and not chr# format. The # value must correspond to that in the segmented copy ratios data file identically. For example, ABSOLUTE does not equate X with 23 and will exclude these mutations as unmapped mutations. Note ABSOLUTE algorithm v1.0.6 excludes X chromosome data but not numbered chromosome, e.g. chr23, data.
ADD COMMENT

Login before adding your answer.

Traffic: 3824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6