Extracting specific cancer types from MAF (Mutation Annotation Format) files from TCGA(The Cancer Genome Atlas)
7 months ago
peter ▴ 10

I am new to bioinformatics therefore, I apologise I have two basic questions. But I have a MAF file. It is mutational information available for TCGA Pan cancer dataset.


I want to obtain specific cancer types for example, breast cancer examples from this MAF file. I can see that this file has a column with Tumor_Sample_Barcode and Matched_Norm_Sample_Barcode which is of the type TCGA-XX-XXXX-XXX-XXX-XXX-XXXX-XX. Am I supposed to use this barcode to extract examples only for breast cancer? For example 3C in TCGA-3C-AAAU-01A-11D-A41F-09 means that breast cancer as specified here:

Tissue source site

If above mentioned technique is fine then I want to download bed file for GC content and histone marks for cell type specific to Breast cancer. How can I do that? I know I should use ENCODE for it but, how can I do it for cell types specific for breast cancer using hg19.

Insights will be appreciated.

ENCODE mutation maf TCGA cancer • 675 views
7 months ago


Yes, regarding the Tissue Source Site (TSS). However, there is '3C', and there are, then, many other codes that relate to breast invasive carcinoma. '3C' is specific to breast invasive carcinoma from Columbia University. Another way to do this is to retrieve just the breast cancer MAF files via the GDC Data Repository: https://portal.gdc.cancer.gov/repository

Regarding the GC content and histone marks, the GC content information should be available via the UCSC Table Browser, but please do also search for information on how to do this via your web browser of choice. Regarding Encode histone marks, the data is available and here are some informative links:


Thank you Kevin. This is quite helpful.


