Dear Biostars,
from an integrative analysis of two microarray colon cancer datasets, i have identified a small gene signature separating primary colorectal cancers froma adjucent mucosa. I would like to further inspect this subset of genes in external datasets, especially with relative mutation data, to inspect/identify any interesting trends regarding mutational patterns in any of these genes.
In this direction, using cBioPortal, i selected the Provisional TCGA Colorectal dataset (~633 samples), with my 94 gene symbols. The results from Oncoprint look quite interesting, as a very small subset of these genes~9-show high mutation rates (more than 10%), as also a specific pattern of copy number alteration-amplification in the almost the same subset of patients/samples. Hence, my main question is:
because i have never used before mutational data, i can interpret the results as follows : "as the calls from the GISTIC are putative", these 9 genes, could be considered as putative targets for therapy, due to the high number of mutation rate, as also from their pattern of overamplification ? as also that they are also located in the same chromosome ? or something similar due to the additional value of mutations in these specific genes ?
I would like also to apologize but again it is my first time to utilize mutation data
Best,
Efstathios
Dear Kevin,
thank you for your excellent and detailed answer. Many thoughts for this matter. For simplicity, i uploaded a png screenshot of the oncoprint (also for individual research requirements i did not include the gene names). Firstly, except the type of the genetic alteration mentioned, where i can find the info (missense, nosense, etc) you mentioned ? As you can see from the pattern, the GISTIC putative calls are mainly amplification (and mRNA overexpression). So in my view (and correct me if im wrong),
Moreover, 7 of these 9 genes are located to chromosome 20, which makes it further interesting. Also agree about searching more for these genes about their functional annotation!
https://www.dropbox.com/s/sc43n1ka822yr0a/oncoprint.modified.png?dl=0
Just a small updated addition in my above plot: actually, the total percentage of alteration includes in the above case also for instance mRNA upregulation, and in might be a bit confusing-but as i checked, the upregulation as z-score is relevant to the total other population of cancer samples, and it might have additional meaning
Hi, your oncoprint certainly looks intriguing, in the sense that mutations in the 9 genes appears to associate strongly with the copy number amplifications.
Some other thoughts:
By the way, if you downloaded your mutation data direct from cBioPortal, then I believe that it is already corrected for issues like the issue that I mentioned with the TTN gene, i.e., the mutation frequencies may already be 'normalised' for gene length. cBioPortal may additionally be only providing missense mutations. You should check the cBioPortal website and be sure that you know the exact dta that you have. The last time that I obtained TCGA data, I obtained the raw data from the GDC legacy archive
Dear Kevin,
again thank you for your again comprehensive answer !! actually, i only used cBioPortal for convinience and have a first initial screen of the putative mutation patterns of these genes-i have not downloaded from GDC or from he cBioPortal any data. So i hope the issue of gene length you have mentioned to be already taken into account, although in FAQ section i did not noticed anything.
Your other very interesting idea about the association of amplifications and mutations: you meant probably that further from the image label "Mutation Spectrum" that seems to be associated with the "red bands"-amplification, to perform also some kind of statistical testing-but regarding the data, perhaps i could find this information in the download area ?
i used a random example for a different cancer with a random gene-set, and in the download section it has:
http://www.cbioportal.org/index.do?session_id=59e553ab498e5df2e296c558&show_samples=false&
So, perhaps the column "Type of Genetic alterations across all cases: (Alterations are summarized as MUT, Gain, HetLoss, etc.)" should contain what you mentioned ?
I will also return with update about your paper post about chromosomal instability
Dear Kevin,
a small update about the 25CIN signature-2 of my 9 genes mentioned are included in this gene signature-but on the broader signature, i also found 4 more genes that are also include in my broader signature-perhaps the most "valid" and safe comment that i can state (as of course a supplement in my already performed work), if the pattern of putative amplification predicted for these specific genes among my signature, in specific subgroups of the cohort, which needs further invastigation but at the same time, gives a further putative clinical implication of my gene signature, except its discrimination power among cancer and control samples--as the amplification status, is among cancer only samples. Which of course could be an interesting starting point for invastigating these genes for actual CNA, for example with many tools, such as with CMA (chromosome microarray analysis...)
Hi!
Yes, that is certainly an interesting lead (i.e., an interesting starting point from which you can do further investigations). Just to be sure that we are on the same thought, though: My general idea is that mutations in these CIN genes will increase the amount of chromosomal and genomic instability, and therefore promote a high number of both copy number alterations (amplifications and deletions) and also further single point somatic mutations. This is what your data appears to show.
Sounds interesting!
Dear Kevin,
thank you for your answer !! Yes, i have the same approach on this matter, in the context of "driving cancer genes" context. I mentioned the rest, because i started with the signature as between cancer and normal samples, and in this plot has only cancer ones-which does not reject or hamper the usefulness of the genes-but exactly implies the general idea you proposed. Moreover, perhaps for the further (might be ??) therapeutic approach, it would be more interesting the fact that this pattern of putative aplification applies to specific "subsets" of patients, and might suggest different treatment, etc. Perhaps, i should search more of the clinical variables information to add, in order to identify any other interesting information.
Cheers,
Efstathios
Hi Efstathios, yes, certainly, copy number alterations are now being viewed as important in terms of therapies and treatment strategy. So, it is an interesting area to research.
Kevin
@Kevin Blighe
Sorry I got a confusion about GISTIC score; People says that GISTIC score -2 does mean a deletion
I have GISTIC score file in which I am not seeing any negative values (score)
When plotting arms but I see some deletion in peaks
For instance from segmentation file, as total copy number of the segment is zero for CDKN2A gene (9p21.3), I assume this gene should has been deleted. In del_genes.conf_95.txt file I can see CDKN2A but in GISTIC score file I am not seeing any negative score to relate that to CDKN2A
Please you may help getting intuition how Gistic score be interpreted as deletion or amplification
Thank you again