Question: TCGA Mutation Annotation Files (MAFs) with ref_context column
5.0 years ago by
Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK

Hello everyone,

I have MAFs annotated with the NCBI Build 37. Some of them have a column named "ref_context" which has a string with 21 nucleotides of the wild type gene, where the nucleotide in the middle is the one that was mutated. Unfortunately, some of the MAFs don't have this column. Therefore, I was wondering if it is possible to get the same MAFs but with the column ref_context.

If this wasn't possible, the other option is to download the reference genome GCRh37 and map the position of the mutation in the gene sequence and get the nucleotides upstream and downstream. I have done that, however the wild type codons don't match in the position these are annotated in the MAF with the codon at that position in the files I have got. I have got the files from UCSC and NCBI but neither match. The gene IDs that I am using are NM_XXXXXX. 

Of course getting the MAFs with the ref_context column would be the best solution, so I would appreciate very much if someone knows if I can get the files and could suggest me how.


tcga maf • 1.8k views
ADD COMMENTlink modified 5.0 years ago by Cyriac Kandoth5.5k • written 5.0 years ago by Alejandro Jimenez Sanchez120
5.0 years ago by
Cyriac Kandoth5.5k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.5k wrote:

ref_context is not a standard MAF column and not all the TCGA MAF curators created it. It was generated using simple tools like bedtools getfasta or samtools faidx to pull sequence context from reference FASTA files, around the genomic loci in column 6 Start_Position.

If you used the GRCh37-Lite FASTA, then your method should have worked fine. Troubleshoot it a bit and let us know. There's a bug either in the MAF or in your method. Give bedtools a try.

ADD COMMENTlink modified 8 months ago by RamRS27k • written 5.0 years ago by Cyriac Kandoth5.5k

Thank you very much for the rapid answer, and for clarifying me that ref_context is not a standard MAF column. Also thanks for the options you suggest. I got a solution, I downloaded the chromosomes sequences for the GCRh37 from UCSC and used the coordinates for the mutation in the chromosome given in the MAF file and it works very well. Thanks again for your answer.

ADD REPLYlink written 5.0 years ago by Alejandro Jimenez Sanchez120
