Question

trying to reproduce heat map made via cBioportal using download data

1

Entering edit mode

7.1 years ago

chrisclarkson100 ▴ 160

I need to make a statistical comparison using breast cancer data. I have made a heat map at the following link on the Bioportal: http://www.cbioportal.org/index.do?cancer_study_id=brca_tcga&Z_SCORE_THRESHOLD=2.0&RPPA_SCORE_THRESHOLD=2.0&data_priority=0&case_set_id=brca_tcga_mrna&gene_list=BRCA1%250ABRCA2%250ATP53%250ACCL2%250ACCR3%250ACD44%250AENG%250AIL6%250AIL33%250ACD33%250ACSF1%250AHIF1A%250ACLEC7A&geneset_list=%20&tab_index=tab_visualize&Action=Submit&genetic_profile_ids_PROFILE_MRNA_EXPRESSION=brca_tcga_mrna_median_Zscores&show_samples=false&heatmap_track_groups=brca_tcga_mrna_median_Zscores%2CBRCA1%2CBRCA2%2CTP53%2CCCL2%2CCCR3%2CCD44%2CENG%2CIL6%2CIL33%2CCD33%2CCSF1%2CHIF1A%2CCLEC7A

enter image description here

To do this I initially went to the http://www.cbioportal.org page and selected the cancer samples that I am interested in: Breast Invasive Carcinoma (TCGA, Provisional)

I then went to the textbook to enter the names of the genes that I am interested in: BRCA1 BRCA2 TP53 CCL2 CCR3 CD44 ENG IL6 IL33 CD33 CSF1 HIF1A CLEC7A

Then I submitted the query and was able to plot a clustered heat map (by clicking 'Heatmap>>Add genes to heatmap>>cluster heatmap') in the "oncoprint" tab.

While this type of annotation is useful, I would also like to be able to download the data that is generating this heat map and make a similar heat map on my local

computer (for experimental reasons).

To attempt this I went to the original search page and clicked the "View summary" button.

From this I found a "Download Data" button at the top of the page.

This returns a 'tar.gz' file with lots of interesting datasets. e.g.:

data_mRNA_median_Zscores.txt
data_expression_median.txt
data_RNA_Seq_v2_mRNA_median_Zscores.txt
data_RNA_Seq_v2_expression_median.txt

I want to find the expression data that was used to generate the histogram shown in the first provided link. From the downloaded files, I initially

tried data_RNA_Seq_v2_expression_median.txt

Below is my attempt to reproduce a heatmap similar to the one above:

data=read.table('data_RNA_Seq_v2_expression_median.txt',header=T,fill = T,stringsAsFactors = F)
genes_OI=c("BRCA1","BRCA2","TP53","CCL2","CCR3","CD44","ENG","IL6","IL33","CD33","CSF1","HIF1A","CLEC7A")

data_OI=data.frame()
for(i in genes_OI$V1){
 data_OI=rbind(data_OI,data[which(data[,1]==i),])
}
sumis.na(data_OI))
library(gplots)
png('test_TCGA_patients.png',height = 1000,width=1000)
data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric)

data=na.omit(data)
heatmap.2(as.matrix(data_OI[,-c(1,2)]),labCol = NA,
         labRow = data_OI[,1],cexRow = 1.4,keysize = 1.4)
dev.off()

The resulting heatmpat is as follows:

enter image description here

But this is not at all like the heatmap in the link at the top of the page.... is there some normalisation step that I am missing?

I also tried using the file where the the Zscores were computed: data_RNA_Seq_v2_mRNA_median_Zscores.txt

This file however (just from looking at the file content does not require any distance matrix):

data=read.table('data_RNA_Seq_v2_mRNA_median_Zscores.txt',header=T,fill = T,stringsAsFactors = F)
genes_OI=c("BRCA1","BRCA2","TP53","CCL2","CCR3","CD44","ENG","IL6","IL33","CD33","CSF1","HIF1A","CLEC7A")

data_OI=data.frame()
 for(i in genes_OI$V1){
     data_OI=rbind(data_OI,data[which(data[,1]==i),])
 }
png('test_TCGA_patients.png',height = 1000,width=1000)
data_OI[,-c(1,2)]=apply(as.matrix(data_OI[,-c(1,2)]), 2, as.numeric)

data=na.omit(data)

hclustfunc <- function(x, method = "complete", dmeth = "euclidean") {    
  hclust(dist(x, method = dmeth), method = method)
}
rc<-hclustfunc(data_OI[,-c(1,2)])
cd=t(data_OI[,-c(1,2)])
cc<-hclustfunc(cd)
heatmap(as.matrix(data_OI[,-c(1,2)]), Rowv=as.dendrogram(rc), 
        Colv=as.dendrogram(cc),labRow = data_OI[,1],labCol = NA)
dev.off()

This unfortunately does not produce anything similar to the heat map seen in the link above....

Hence I am wondering were it is that I am going wrong....? Am I using the correct file or is there

a normalisation step that I am missing?

RNA-Seq • 4.5k views

ADD COMMENT • link updated 7.1 years ago by Kevin Blighe 89k • written 7.1 years ago by chrisclarkson100 ▴ 160

0

Entering edit mode

The cBioPortal group only refers to the part at the bottom as the heatmap. The main figure is an 'oncoprint', with some form of a heatmap at the bottom.

Take a look here: OncoPrint

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi sorry forgot to add step about "clicking 'Heatmap>>Add genes to heatmap>>cluster heatmap". The question has been modified necessarily.

ADD REPLY • link 7.1 years ago by chrisclarkson100 ▴ 160

0

Entering edit mode

I have posted an answer for you - see below

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

See: How to add images to a Biostars post to understand how to embed images in a post. I've made the necessary changes for now.

ADD REPLY • link 7.1 years ago by Ram 45k

0

Entering edit mode

thanks this has been amended

ADD REPLY • link 7.1 years ago by chrisclarkson100 ▴ 160

0

Entering edit mode

What is the basis of heatmap clustering? From what factor depends the order of patients after selecting "cluster heatmap" in the oncoprint? Thank you.

ADD REPLY • link 5.7 years ago by massimofaggiani • 0

0

Entering edit mode

The code contains the answer to your question. Check out ?hclust and ?heatmap.2

ADD REPLY • link 5.7 years ago by Ram 45k

0

Entering edit mode

Thanks. So is it essentially hierarchical clustering to find the maximum number of similarities and cluster them together? Because in my case, I have clustered only 4 genes and perhaps it's quite expected to not observe any clustering.

ADD REPLY • link 5.7 years ago by massimofaggiani • 0

0

Entering edit mode

If that's what the manual says, that's what it is. Do you have any questions that the manual does not cover?

ADD REPLY • link 5.7 years ago by Ram 45k

score 1 · Answer 1 · 2018-06-13

Hello again,

I see - thank you for clarifying.

Can you do me a favour and plot a histogram of the data from data_RNA_Seq_v2_expression_median.txt ? I assume that it's just the normalised counts, whereas the other file ( data_RNA_Seq_v2_mRNA_median_Zscores.txt ) represents the Z-scores of these normalised counts. Just use:

hist(data)

or

hist(data.matrix(data))

If you use the data from the first file (data_RNA_Seq_v2_expression_median.txt), you are virtually guaranteed a better heatmap if you do the following:

Read in data

data <- read.table('data_RNA_Seq_v2_expression_median.txt',header=T,fill = T,stringsAsFactors = F)
genes_OI <- c("BRCA1","BRCA2","TP53","CCL2","CCR3","CD44","ENG","IL6","IL33","CD33","CSF1","HIF1A","CLEC7A")

Set rownames

rownames(data) <- data[,1]
data <- data.matrix(data[,-1])

Select out genes of interest

data_OI <- data[which(rownames(data) %in% genes_OI),]

Pick a color scheme and set breaks

require("RColorBrewer")
myCol <- colorRampPalette(c("darkblue", "black", "darkred"))(100)

Set coluor 'breaks' at -2 and +2 Z-score

myBreaks <- seq(-2, 2, length.out=101)

Transform your data to the Z-scale

heat <- t(scale(t(data_OI)))

Plot the heatmap, specify your custom breaks and colour scheme, and switch further scaling (by heatmap.2) off

library(gplots)
pdf('test_TCGA_patients.pdf', width=7, height=5)
  heatmap.2(heat,
    col=myCol,
    breaks=myBreaks,
    main="Title",
    key=T, keysize=1.0,
    scale="none",
    density.info="none",
    reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
    trace="none",
    labRow
    cexRow=1.0,

    cexCol=0.2,
    distfun=function(x) dist(x, method="euclidean"),
    hclustfun=function(x) hclust(x, method="ward.D2"))
dev.off()

You can switch off the dendrograms by adding the following parameters:

heatmap.2(..., dendrogram="none", Rowv=FALSE, Colv=FALSE, ...)

If you want to do this with the data_RNA_Seq_v2_mRNA_median_Zscores.txt data, then just skip the heat <- t(scale(t(data_OI))) command.

This code is untested because I don't actually have the files in front of me. It should work, though. Some of your functions, like for selecting out genes and then converting to a data matrix, looked overly complex...

Kevin