Question: MAF file having mutation data
21 months ago by
rajesh 60
rajesh 60 wrote:

Hi everyone

I have downloaded the "Masked Somatic mutation" file for Pancreatic adenocarcinoma form TCGA.

This file contains information regarding somatic mutation present in the tumor sample when compared to the reference and matched normal sample.

My question is that

  1. Since at the beginning of each file, there is Gene name and in the sixth and seventh column, there is coordinate. So the mutation type present in MAF file is it only for protein-coding part or thy corresponds to non-protein coding part also.
  2. I have to map mutation onto the non-coding part of the genome, especially the enhancer region.
  3. If non-coding mutation is not present in MAF file, then where to download the mutation file.

I am attaching the sample file also.

Hugo_Symbol Entrez_Gene_Id  Center  NCBI_Build  Chromosome  Start_Position  End_Position    Strand  Variant_Classification  Variant_Type    Reference_Allele    Tumor_Seq_Allele1   Tumor_Seq_Allele2   dbSNP_RS    dbSNP_Val_Status    Tumor_Sample_Barcode    Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1  Match_Norm_Seq_Allele2  Tumor_Validation_Allele1    Tumor_Validation_Allele2    Match_Norm_Validation_Allele1   Match_Norm_Validation_Allele2   Verification_Status Validation_Status   Mutation_Status Sequencing_Phase    Sequence_Source Validation_Method   Score   BAM_File    Sequencer   Tumor_Sample_UUID   Matched_Norm_Sample_UUID    HGVSc   HGVSp   HGVSp_Short Transcript_IDExon_Number    t_depth t_ref_count t_alt_count n_depth n_ref_count n_alt_count all_effects Allele  Gene    Feature Feature_type    One_Consequence Consequence cDNA_position   CDS_position    Protein_position    Amino_acids Codons  Existing_variation  ALLELE_NUM  DISTANCE    TRANSCRIPT_STRAND   SYMBOL  SYMBOL_SOURCE   HGNC_ID BIOTYPE CANONICAL   CCDS    ENSP    SWISSPROT   TREMBL  UNIPARC RefSeq  SIFT    PolyPhen    EXON    INTRON  DOMAINS GMAF    AFR_MAF AMR_MAF ASN_MAF EAS_MAF EUR_MAF SAS_MAF AA_MAEA_MAF CLIN_SIG    SOMATIC PUBMED  MOTIF_NAME  MOTIF_POS   HIGH_INF_POS    MOTIF_SCORE_CHANGE  IMPACT  PICK    VARIANT_CLASS   TSL HGVS_OFFSET PHENO   MINIMISED   ExAC_AF ExAC_AF_Adj ExAC_AF_AFR ExAC_AF_AMR ExAC_AF_EAS ExAC_AF_FIN ExAC_AF_NFE ExAC_AF_OTH ExAC_AF_SAS GENE_PHENO  FILTER  CONTEXT src_vcf_id  tumor_bam_uuid  normal_bam_uuid case_id GDC_FILTER  COSMIC  MC3_Overlap GDC_Validation_Status
BCAN    63827   BI  GRCh38  chr1    156651635   156651635   +   Missense_Mutation   SNP G   G   A   rs770559603 byFrequency TCGA-2L-AAQJ-01A-12D-A397-08    TCGA-2L-AAQJ-11A-11D-A39A-08                                    Somatic                     Illumina HiSeq 2000 de369dbb-736e-4970-998d-a0470029653f    cb472e98-8801-40f4-9c2c-6ebb03b41c40    c.1243G>A   p.Gly415Arg p.G415R ENST00000329117 7/14    623 524 99  103         BCAN,missense_variant,p.G415R,ENST00000329117,NM_021948.4,c.1243G>A,MODERATE,YES,tolerated(0.16),benign(0.013),1;BCAN,missense_variant,p.G415R,ENST00000361588,NM_198427.1,c.1243G>A,MODERATE,,tolerated(0.21),benign(0.022),1;BCAN,downstream_gene_variant,,ENST00000424639,,,MODIFIER,,,,1;BCAN,downstream_gene_variant,,ENST00000457777,,,MODIFIER,,,,1;BCAN,downstream_gene_variant,,ENST00000441358,,,MODIFIER,,,,1;RP11-284F21.7,intron_variant,,ENST00000448869,,n.111-4481C>T,MODIFIER,YES,,,-1;BCAN,3_prime_UTR_variant,,ENST00000479949,,c.*477G>A,MODIFIER,,,,1;BCAN,downstream_gene_variant,,ENST00000491823,,,MODIFIER,,,,1    A   ENSG00000132692 ENST00000329117 Transcript  missense_variant    missense_variant    1579/3466   1243/2736   415/911 G/R Gga/Aga rs770559603 1       1   BCAN    HGNC    HGNC:23059  protein_coding  YES CCDS1149.1  ENSP00000331210 Q96GW7      UPI000006F0E9   NM_021948.4 tolerated(0.16) benign(0.013)   7/14        PROSITE_profiles:PS50313                                                            MODERATE    1   SNV 1           1   5.766e-05   5.822e-05   0   0   0   0   0.0001059   0   0       panel_of_normals    ACGGAGGAGGT bd948014-be86-4c11-8061-a96b8c73fa83    9f9d28db-babf-4851-a32f-f00f97c523f8    81dd6131-efa9-4bad-9539-93e15b8100a6    f96ab3fe-bb11-4585-a35e-52d400e55ab7    gdc_pon     True    Unknown`
snp • 893 views
modified 21 months ago by _r_am32k • written 21 months ago by rajesh 60
21 months ago by
bari.ballew250 wrote:

TCGA has both WES and WGS data. If you're looking at WES data, you'll see mainly protein-coding mutations, as this is what exome sequencing focuses on. If you're looking for non-coding regions, you may be more interested in the whole genome data.

written 21 months ago by bari.ballew250

Thanks for the reply, but I still do not get the answer, is MAF file is for WES mutation or are they for WGS mutation. please clarify this.

written 21 months ago by rajesh 60

Could you also tell me that how to download WGS mutation files from the GDC TCGA.

written 21 months ago by rajesh 60

A MAF file is just a list of mutations found in a given sample(s). You can find mutations in either WES or WGS data, so you can have a MAF file (or a VCF file) for either type of experiment. For WGS, you are looking at mutations across the whole genome, so you will likely have more mutations listed in the MAF compared to WES. To find WGS projects through the GDC, you can click on Projects or Repository, look under "Experimental Strategies," and check the box "WGS."

written 21 months ago by bari.ballew250

But the WGS file are not open access, these are protected and controlled. Am i right.

written 21 months ago by rajesh 60

Yes, any potentially identifiable data in GDC falls under controlled access. From GDC's documentation:

Open access data generally includes high level genomic data that is not individually identifiable, as well as most clinical and all biospecimen data elements.

Controlled data generally includes individually identifiable data such as low level genomic sequencing data, germline variants, SNP6 genotype data, and certain clinical data elements. Access to controlled data is granted by program-specific Data Access Committees. See Obtaining Access to Controlled Data for details.

You can find information on accessing controlled data here:

written 21 months ago by bari.ballew250
