Getting reference genome for vcf2maf tool
1
0
Entering edit mode
3.6 years ago

Hi there.

I'm trying to use the vcf2maf tool to convert TCGA MAF files into VCF format, but have been stuck for days trying to obtain the appropriate reference fasta file for the --ref flag.

I've tried pointing it to vep fasta files, downloading directly from ensembl, and getting it from NCBI and indexing myself with samtools. I've also tried unzipping, or rezipping with bzip, but I still get the same error with many lines complaining about not being able to fetch certain sequences:

[W::fai_get_val] Reference chr12:52568256-52568258 not found in file, returning empty sequence [faidx] Failed to fetch sequence in chr12:52568256-52568258 ERROR: Make sure that ref-fasta is the same genome build as your MAF: genome-assemblies/homo_sapiens/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

The command that I'm running is perl ../tools/vcf2maf/maf2vcf.pl --input-maf ./03652df4-6090-4f5a-a2ff-ee28a37f9301/TCGA.COAD.mutect.03652df4-6090-4f5a-a2ff-ee28a37f9301.DR-10.0.somatic.maf --ref genome-assemblies/homo_sapiens/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --output-dir TEST1 and the input MAF is here. The reference genome on this page says GRCh38.p0. How can I get an appropriate fasta reference for the vcf2maf tool?

assembly genome sequence perl • 1.7k views
ADD COMMENT
0
Entering edit mode
3.5 years ago

As far as I understand this https://docs.cancergenomicscloud.org/docs/tcga-grch38-data

then it should be the GDC version here: https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files

Have you tried that ?

ADD COMMENT

Login before adding your answer.

Traffic: 1868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6