cBioportal: Mutations in the database don't appear in the portal
9 weeks ago

Hi! I am working with the invasive breast cancer database from the pancancer project of TCGA downloaded through cbioportal (https://www.cbioportal.org/study/summary?id=brca_tcga_pan_can_atlas_2018). Working with it in R, I found a mutation in HINT3 (Tumor Sample Barcode TCGA-A8-A09A-01), but when I wanted to check it directly using cBioportal it doesn't appeared there. Does anyone know why it is? I also noticed it occurs with a bunch of other mutations... I am really confused now, so some help would be really appreciated.

8 weeks ago
pilargmarch ▴ 80

I downloaded the data from Breast Invasive Carcinoma (TCGA, PanCancer Atlas) from cBioPortal and this is the mutation you're referring to:

Hugo_Symbol Entrez_Gene_Id  Center  NCBI_Build  Chromosome  Start_Position  End_Position    Strand  Consequence Variant_Classification  Variant_Type    Reference_Allele    Tumor_Seq_Allele1   Tumor_Seq_Allele2   dbSNP_RS    dbSNP_Val_Status    Tumor_Sample_Barcode    Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1  Match_Norm_Seq_Allele2  Tumor_Validation_Allele1    Tumor_Validation_Allele2    Match_Norm_Validation_Allele1   Match_Norm_Validation_Allele2   Verification_Status Validation_Status   Mutation_Status Sequencing_Phase    Sequence_Source Validation_Method   Score   BAM_File    Sequencer   t_ref_count t_alt_count n_ref_count n_alt_count HGVSc   HGVSp   HGVSp_Short Transcript_ID   RefSeq  Protein_position    Codons  Hotspot AA_MAF  AFR_MAF ALLELE_NUM  AMR_MAF ASN_MAF Allele  Amino_acids BIOTYPE CANONICAL   CCDS    CDS_position    CENTERS CLIN_SIG    CONTEXT COSMIC  DBVS    DISTANCE    DOMAINS EAS_MAF EA_MAF  ENSP    EUR_MAF EXON    ExAC_AF ExAC_AF_AFR ExAC_AF_AMR ExAC_AF_EAS ExAC_AF_FIN ExAC_AF_NFE ExAC_AF_OTH ExAC_AF_SAS Existing_variation  FILTER  Feature Feature_type    GENE_PHENO  GMAF    Gene    HGNC_ID HGVS_OFFSET HIGH_INF_POS    IMPACT  INTRON  MERGESOURCE MOTIF_NAME  MOTIF_POS   MOTIF_SCORE_CHANGE  NCALLERS    PHENO   PICK    PolyPhen    SAS_MAF SIFT    SOMATIC SWISSPROT   SYMBOL  SYMBOL_SOURCE   TREMBL  TSL UNIPARC VARIANT_CLASS   all_effects cDNA_position   n_depth t_depth Annotation_Status
HINT3   135114  .   GRCh37  6   126299011   126299011   +   3_prime_UTR_variant 3'UTR   SNP T   T   A   novel   .   TCGA-A8-A09A-01 TCGA-A8-A09A-10 T   T   .   .   .   .   .   .   .   .   .   .   .   .   .   29  16  60  0   ENST00000229633.5:c.*189T>A         ENST00000229633 NM_138571.4         0   .   .   .   .   .   A   .   protein_coding  YES CCDS5133.1  .   RADIA|MUTECT|MUSE   .   GTTACTGACTT NONE    .   .   .   .   .   ENSP00000229633 .   5/5 .   .   .   .   .   .   .   .   .   wga ENST00000229633 Transcript  .   .   ENSG00000111911 18468   .   .   MODIFIER    .   PRIMARY .   .   .   3   .   .   .   .   .   .   HINT3_HUMAN HINT3   HGNC    .   .   UPI000006F73F   SNV HINT3,3_prime_UTR_variant,,ENST00000229633,;RNA5SP216,downstream_gene_variant,,ENST00000516111,;    935 60  46  SUCCESS

As you can see, the consequence type is "3_prime_UTR_variant". You can find information on mutation consequence classification here.

enter image description here

From cBioPortal FAQ:

Does the cBioPortal contain synonymous mutation data? No, the cBioPortal does not currently support synonymous mutations. This may change in the future, but we have no plans yet to add this feature.

You can read about it in more detail in this discussion and also this one, but basically, as explained here cBioPortal filters out "synonymous" mutations (Silent, Intron, IGR, 3'UTR, 5'UTR, 3'Flank and 5'Flank), as it is assumed that these will have no impact on the patient.

In fact, if you see the list of mutated genes for this patient in cBioPortal , you can see that the only mutation types are "Frame_Shift_Del", "In_Frame_Del", "Missense_Mutation", "Nonsense_Mutation" and "Splice_Region". If you filter the original data that you can download from cBioPortal to only include these mutation types, you get 191 unique genes out of the 289 unique genes with any mutations, close to the 199 unique genes appearing as mutated on cBioPortal. I suspect the overlap isn't identical because they use different gene aliases for the differing genes (HS6ST2, XRCC6BP1, FAM214A, HKR1, KIAA0947, AIM1 are only in original data; ATOSA, ATP23, CLEC4D, COG3, CRYBG1, FNBP1L, HNMT, ICE1, LOXL4, MRPL45, RCBTB1, SEMA7A, ST6GAL1, ZNF875 are only in cBioPortal).

enter image description here

If for some reason you want to include these synonymous mutations into your analysis, then I suggest you use GDC Data Portal, which lets you filter mutations based on genes, types and predicted impact. Keep in mind the data might not be the same (some samples might have been excluded in the PanCancer version) and the pipeline used to call the mutations is different (GDC uses an ensemble of 4 methods and I'm not sure what PanCancer used). For example, for the patient you mentioned there are only 221 somatic mutations on the GDC Data Portal, and HINT3 isn't among them; it does show a CNV loss for HINT3, interestingly.


