Question: TCGA Mutation Annotation Files (MAFs) with ref_context column
gravatar for Alejandro Jimenez Sanchez
5.5 years ago by
Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK

Hello everyone,

I have MAFs annotated with the NCBI Build 37. Some of them have a column named "ref_context" which has a string with 21 nucleotides of the wild type gene, where the nucleotide in the middle is the one that was mutated. Unfortunately, some of the MAFs don't have this column. Therefore, I was wondering if it is possible to get the same MAFs but with the column ref_context.

If this wasn't possible, the other option is to download the reference genome GCRh37 and map the position of the mutation in the gene sequence and get the nucleotides upstream and downstream. I have done that, however the wild type codons don't match in the position these are annotated in the MAF with the codon at that position in the files I have got. I have got the files from UCSC and NCBI but neither match. The gene IDs that I am using are NM_XXXXXX. 

Of course getting the MAFs with the ref_context column would be the best solution, so I would appreciate very much if someone knows if I can get the files and could suggest me how.


tcga maf • 1.9k views
ADD COMMENTlink modified 5.5 years ago by Cyriac Kandoth5.5k • written 5.5 years ago by Alejandro Jimenez Sanchez120
gravatar for Cyriac Kandoth
5.5 years ago by
Cyriac Kandoth5.5k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.5k wrote:

ref_context is not a standard MAF column and not all the TCGA MAF curators created it. It was generated using simple tools like bedtools getfasta or samtools faidx to pull sequence context from reference FASTA files, around the genomic loci in column 6 Start_Position.

If you used the GRCh37-Lite FASTA, then your method should have worked fine. Troubleshoot it a bit and let us know. There's a bug either in the MAF or in your method. Give bedtools a try.

ADD COMMENTlink modified 14 months ago by _r_am32k • written 5.5 years ago by Cyriac Kandoth5.5k

Thank you very much for the rapid answer, and for clarifying me that ref_context is not a standard MAF column. Also thanks for the options you suggest. I got a solution, I downloaded the chromosomes sequences for the GCRh37 from UCSC and used the coordinates for the mutation in the chromosome given in the MAF file and it works very well. Thanks again for your answer.

ADD REPLYlink written 5.5 years ago by Alejandro Jimenez Sanchez120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2417 users visited in the last hour