Question

How to compare correlations between tumor samples

0

Entering edit mode

7.2 years ago

carl.h ▴ 20

I would like to compare the correlation of a set of genes to a certain gene of interest (GOI) and compare these between different gene sets. I know from before that these genes have correlation to the GOI in protein assays, but I would like to use the TCGA data to confirm this in multiple tumors. I would like to compare the amount of correlation between tissues and in which tissues there is highest correlation. Also I would like to say if those genes individually is generally correlated to the gene of interest.

Is there some way graphically I could illustrate this or look at the correlation data between different tissues? Since it is a big amount of data when I do correlations between the genes it easily gets hard to overview. I also think if I should compare tumor data with the normal tissue data that exists in TCGA? Someone done comparisons between tissues have any suggestions or anyone have any ideas how I should construct this?

RNA-Seq • 2.5k views

ADD COMMENT • link updated 7.2 years ago by DG 7.3k • written 7.2 years ago by carl.h ▴ 20

0

Entering edit mode

Is your correlation data in a matrix?

Maybe you can try to present your correlations in a symmetrical heatmap? I find these correlation matrices always very useful.

ADD REPLY • link 7.2 years ago by Benn 8.3k

0

Entering edit mode

Thank you for the answer. Yes, so to be more precise. I am correlating normalized gene expression data. So the gene expression data of one gene compared to 20 other genes in each cancer. In the TCGA database there are something like 30 kinds of cancers with very varying sample sizes (but in total like 10000 samples). So what I get out is how each gene correlates to the GOI for each cancer. heatmap is not a bad idea, I have tried but with so many cancers there are up and down correlations and some significant and some not. Seems to be many of the 20 genes that are significantly correlated, but I am not sure how to prove it and show it in a good way.

ADD REPLY • link 7.2 years ago by carl.h ▴ 20

0

Entering edit mode

What are you correlating? You should be more specific as you don't just correlate a gene with another, you correlate properties of a gene. Are you correlating something like mutation profiles or expression levels? As b.nota suggested heat maps with something like hierarchical clustering tends to be widely used and fairly easy to interpret. Depending on how you are calculating correlations I still always find scatterplots with correlation coefficients to be useful as well. Although depending on your data scatterplots may not be useful.

ADD REPLY • link 7.2 years ago by DG 7.3k

0

Entering edit mode

I posted reply above.

ADD REPLY • link 7.2 years ago by carl.h ▴ 20

score 0 · Answer 1 · 2017-02-24

0

Entering edit mode

7.2 years ago

DG 7.3k

I think you're going to want to do multiple types of plots.In the scheme of things you're not plotting that much information, 20 genes by 30 cancer's isn't that big of a deal. You essentially can have a correlation coefficient and adjusted p-value for each of those (Gene X with GOI in Cancer Y) with simple Spearman's Correlation, like what cBioPortal does. I worked up a script to do this given a set of genes you want to look at, what you want to compare to, and what cancer sets from the TCGA you want to compare in. However, in our publications that we've used it in we didn't show them graphically, just identified the comparisons that were correlated or anti-correlated because we were interested in the biology of what was going on overall.

Heatmaps, boxplots, etc will all give you different views of the data. So would a giant table with significant values highlighted.

ADD COMMENT • link 7.2 years ago by DG 7.3k

0

Entering edit mode

Thank you for your answer! What do you make of this? I have done a correlation with the GOI to the other gens, which I plotted on the x-axis. The blue is with p-value<0.05. The dot names is for each individual Cancer. So from this image it seems to be some genes that would have higher correlation with my gene of interest in certain cancers. But when I for example try other genes there is always some correlations even with adjusted p-value. I guess the rna-seq data is correlated more or less to eachother, but I would like to sort out the background noise and just get the picture of how it is. Plot

ADD REPLY • link 7.1 years ago by carl.h ▴ 20

0

Entering edit mode

I'm really not sure what you are asking that is a problem? It isn't surprising that other genes, besides your set, will have correlations. For instance, anything in the same pathways will tend to have correlated gene expression values. If you want to know the background you essentially need to do the calculations for all genes to know the background correlation and then place your genes of interest on that distribution of background correlation. In which case you might want to do something like calculate Z-scores.

ADD REPLY • link 7.1 years ago by DG 7.3k

0

Entering edit mode

Thank you for your input, it helps me get to the bottom of this. I agree it is not strange that many genes are correlated with eachother, in these cases - thousands of them - but then the question comes how much is just by chance if a certain amount of all genes is correlated with eachother. And between tumors if so many genes are correlated with each otherhow can I claim that my finding is important? The thing I would like to say is: these genes seems to be correlated to the gene of interest in these cancers. I am too interested in the biology of all this so I am just looking to see if there is any support of the previous findings in the clinical findings.

ADD REPLY • link 7.1 years ago by carl.h ▴ 20

0

Entering edit mode

Well, your correlation values, which give the strength of the correlation, and your adjusted p-values are the two values that are typically used to answer exactly that.

ADD REPLY • link 7.1 years ago by DG 7.3k