Question: Interpretation of GISTIC putative copy number alterations for mutations with cBioPortal in a specific cancer dataset
gravatar for svlachavas
3.1 years ago by
svlachavas680 wrote:

Dear Biostars,

from an integrative analysis of two microarray colon cancer datasets, i have identified a small gene signature separating primary colorectal cancers froma adjucent mucosa. I would like to further inspect this subset of genes in external datasets, especially with relative mutation data, to inspect/identify any interesting trends regarding mutational patterns in any of these genes.

In this direction, using cBioPortal, i selected the Provisional TCGA Colorectal dataset (~633 samples), with my 94 gene symbols. The results from Oncoprint look quite interesting, as a very small subset of these genes~9-show high mutation rates (more than 10%), as also a specific pattern of copy number alteration-amplification in the almost the same subset of patients/samples. Hence, my main question is:

because i have never used before mutational data, i can interpret the results as follows : "as the calls from the GISTIC are putative", these 9 genes, could be considered as putative targets for therapy, due to the high number of mutation rate, as also from their pattern of overamplification ? as also that they are also located in the same chromosome ? or something similar due to the additional value of mutations in these specific genes ?

I would like also to apologize but again it is my first time to utilize mutation data



ADD COMMENTlink modified 3.1 years ago by Kevin Blighe67k • written 3.1 years ago by svlachavas680
gravatar for Kevin Blighe
3.1 years ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:


You have to be careful with the interpretation. Things about which to be aware:

Driver versus passenger mutations / Genomic size of genes

Some genes show high mutation rates merely due to their length. For example, TTN (titin) is the largest protein coding gene in the human genome (I believe), and it therefore picks up a lot of mutations relative to other genes merely due to to the fact that it covers a large genomic range. To be proper, you could correct for the length of each gene. The gene has never directly been linked to any cancer mechanism, from what I know, and its mutations are therefore assumed to be merely passenger mutations. That's the first thing to be aware of.

Mutation type/class

Another thing to be careful with is the class of mutation that you're observing. Are they all missense or a mixture of intronic, syonymous, UTR, and missense?

Gene function

For the 9 genes that you've identified, if they are truly related to copy number amplifications, then I would expect them to be involved in DNA repair and/or chromosomal stability (which could include things as cell division, apoptosis, etc). There will undoubtedly be lots of literature on each gene and it will be your job to do the background reading.

Finally, if you have convinced yourself that the data is real, then it is perfectly fine to associate the 9 genes with copy number amplification and, what I would call, genomic instability, which is a feature of many tumours. If you want further guidance, I highly recommend the TCGA published study on endometrial carcinoma ( ), where a lot of great work was done specifically in this area of dividing tumours based on recurrent copy number and somatic mutation profiles.

ADD COMMENTlink modified 19 months ago • written 3.1 years ago by Kevin Blighe67k

Dear Kevin,

thank you for your excellent and detailed answer. Many thoughts for this matter. For simplicity, i uploaded a png screenshot of the oncoprint (also for individual research requirements i did not include the gene names). Firstly, except the type of the genetic alteration mentioned, where i can find the info (missense, nosense, etc) you mentioned ? As you can see from the pattern, the GISTIC putative calls are mainly amplification (and mRNA overexpression). So in my view (and correct me if im wrong),

Moreover, 7 of these 9 genes are located to chromosome 20, which makes it further interesting. Also agree about searching more for these genes about their functional annotation!

Just a small updated addition in my above plot: actually, the total percentage of alteration includes in the above case also for instance mRNA upregulation, and in might be a bit confusing-but as i checked, the upregulation as z-score is relevant to the total other population of cancer samples, and it might have additional meaning

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by svlachavas680

Hi, your oncoprint certainly looks intriguing, in the sense that mutations in the 9 genes appears to associate strongly with the copy number amplifications.

Some other thoughts:

  • You should try to derive a P-value for the association between amplifications and mutations. For example, count up the mutations in all samples and then divide the samples into two groups: Samples with Amplification Vs Samples without Amplification. A Chi-square test, comparing distributions of data, could work here
  • I'd be interested in seeing if your 9 genes fall into the CIN25 (chromosomal instability) gene signature that was identified by Carter et al. back in 2006. I previously published a very small single patient study on breast cancer, in which I identified CIN25 genes mutated in the metastatic lymph node sample. I am also currently preparing to publish another larger study, but not on colorectal cancer!

By the way, if you downloaded your mutation data direct from cBioPortal, then I believe that it is already corrected for issues like the issue that I mentioned with the TTN gene, i.e., the mutation frequencies may already be 'normalised' for gene length. cBioPortal may additionally be only providing missense mutations. You should check the cBioPortal website and be sure that you know the exact dta that you have. The last time that I obtained TCGA data, I obtained the raw data from the GDC legacy archive

ADD REPLYlink written 3.1 years ago by Kevin Blighe67k

Dear Kevin,

again thank you for your again comprehensive answer !! actually, i only used cBioPortal for convinience and have a first initial screen of the putative mutation patterns of these genes-i have not downloaded from GDC or from he cBioPortal any data. So i hope the issue of gene length you have mentioned to be already taken into account, although in FAQ section i did not noticed anything.

Your other very interesting idea about the association of amplifications and mutations: you meant probably that further from the image label "Mutation Spectrum" that seems to be associated with the "red bands"-amplification, to perform also some kind of statistical testing-but regarding the data, perhaps i could find this information in the download area ?

i used a random example for a different cancer with a random gene-set, and in the download section it has:

So, perhaps the column "Type of Genetic alterations across all cases: (Alterations are summarized as MUT, Gain, HetLoss, etc.)" should contain what you mentioned ?

I will also return with update about your paper post about chromosomal instability

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by svlachavas680

Dear Kevin,

a small update about the 25CIN signature-2 of my 9 genes mentioned are included in this gene signature-but on the broader signature, i also found 4 more genes that are also include in my broader signature-perhaps the most "valid" and safe comment that i can state (as of course a supplement in my already performed work), if the pattern of putative amplification predicted for these specific genes among my signature, in specific subgroups of the cohort, which needs further invastigation but at the same time, gives a further putative clinical implication of my gene signature, except its discrimination power among cancer and control samples--as the amplification status, is among cancer only samples. Which of course could be an interesting starting point for invastigating these genes for actual CNA, for example with many tools, such as with CMA (chromosome microarray analysis...)

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by svlachavas680


Yes, that is certainly an interesting lead (i.e., an interesting starting point from which you can do further investigations). Just to be sure that we are on the same thought, though: My general idea is that mutations in these CIN genes will increase the amount of chromosomal and genomic instability, and therefore promote a high number of both copy number alterations (amplifications and deletions) and also further single point somatic mutations. This is what your data appears to show.

Sounds interesting!

ADD REPLYlink written 3.1 years ago by Kevin Blighe67k

Dear Kevin,

thank you for your answer !! Yes, i have the same approach on this matter, in the context of "driving cancer genes" context. I mentioned the rest, because i started with the signature as between cancer and normal samples, and in this plot has only cancer ones-which does not reject or hamper the usefulness of the genes-but exactly implies the general idea you proposed. Moreover, perhaps for the further (might be ??) therapeutic approach, it would be more interesting the fact that this pattern of putative aplification applies to specific "subsets" of patients, and might suggest different treatment, etc. Perhaps, i should search more of the clinical variables information to add, in order to identify any other interesting information.



ADD REPLYlink written 3.1 years ago by svlachavas680

Hi Efstathios, yes, certainly, copy number alterations are now being viewed as important in terms of therapies and treatment strategy. So, it is an interesting area to research.


ADD REPLYlink written 3.1 years ago by Kevin Blighe67k

@Kevin Blighe

Sorry I got a confusion about GISTIC score; People says that GISTIC score -2 does mean a deletion

I have GISTIC score file in which I am not seeing any negative values (score)

When plotting arms but I see some deletion in peaks

For instance from segmentation file, as total copy number of the segment is zero for CDKN2A gene (9p21.3), I assume this gene should has been deleted. In del_genes.conf_95.txt file I can see CDKN2A but in GISTIC score file I am not seeing any negative score to relate that to CDKN2A

Please you may help getting intuition how Gistic score be interpreted as deletion or amplification

enter image description here

Thank you again

ADD REPLYlink written 8 months ago by A3.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1026 users visited in the last hour