I am trying to produce a heatmap to represent a contrast I performed on my deseq object. We have a relatively big experiment where samples are individual Drosophila in 10 different conditions. I am managing to produce the heatmap, but I am not managing to add the 2 levels of the conditions compared in my contrast on top of the heatmap as an annotation :
resIHW5_tidy<- results(dds, contrast=c("condition","F6","F0"), filterFun = ihw,alpha=0.05, lfcThreshold=1, altHypothesis="greaterAbs", tidy=TRUE)
df_IHW5<- as.data.frame(resIHW5_tidy)
IHW5_sign = subset(df_IHW5, padj<=0.05)
rownames(IHW5_sign)=IHW5_sign$row ## now the row is the FlyBase accession number
# I rearranged the table to get the genes reordered by log2 fold change :
res_log2FC <- arrange(IHW5_sign, desc(IHW5_sign$log2FoldChange))
# I created a vector of the gene accession numbers
genes <- res_log2FC$row
genes <- factor(genes, levels=genes)
1765 Levels: FBgn0000356 FBgn0052644 FBgn0031621 FBgn0000644 ... FBgn0036600
gene_names<-as.factor(res_log2FC$name) # a vector containing the gene names that I assigned to the accession numbers, but the problem is that there are several accession numbers with the same gene name so instead
# So instead I added an other column in IHW5_sign which consists in a concatenation of the accession number and gene name and made it a factor :
genes_names<-paste(genes,"-",gene_names)
genes_names
[1763] "FBgn0051091 - uncharacterized protein"
[1764] "FBgn0005633 - flightin"
[1765] "FBgn0036600 - uncharacterized protein"
genes_names<-factor(genes_names)
# I transformed my deseq object :
vsd<-vst(dds,blind=FALSE)
colData names(10): library sample ... host-treat condition
colData(vsd)$condition <- as.factor(colData(vsd)$condition)
# filtering out the samples that are in my 2 levels of interest (the ones compared in my contrast) only from this new dataset :
vsd_interest<-vsd[ , vsd$condition %in% c("F0", "F6") ]
# vsd_interest$condition, the output still shows me the 10 levels as factors, despite having only the 2 of them effectively represented in the new dataset. I am not sure how much of a problem this is though.
samples_interest<-colnames(vsd_interest)
samples_interest
[1] "3" "6" "8" "14" "46" "47" "59" "60" "74" "75" "80" "86"
[13] "95" "99" "102" "104" "116" "119" "137" "139" "143" "149" "150" "151"
[25] "172" "173" "188" "190" "195" "198" "209" "210" "211" "233" "235" "239"
[37] "242" "243" "460" "985"
# these are indeed my samples of interest in the 2 treatment levels of my contrast
#from my metadata file, called "sample_data.tsv" :
sample_data_interest<- sample_data[sample_data$condition =='F0' | sample_data$condition =='F6',]
samples<-sample_data_interest$library
samples
[1] "3" "6" "8" "14" "46" "47" "59" "60" "74" "75" "80" "86"
[13] "95" "99" "102" "104" "116" "119" "137" "139" "143" "149" "150" "151"
[25] "172" "173" "188" "190" "195" "198" "209" "210" "211" "233" "235" "239"
[37] "242" "243" "460" "985"
# they do match the colnames(vsd_interest)
### Now the heatmap :
pheatmap(assay(vsd_interest)[rownames(assay(vsd_interest)) %in% genes, colnames(assay(vsd_interest)) %in% samples],
cluster_rows=F,
cluster_cols=T,
cellwidth = 10,cellheight=8.6,
color=colorRampPalette(c("#f9cb9c","#a7146a"))(n = 100),
show_colnames=T,
show_rownames = T,
labels_row=genes_names,
heatmap_legend_param = list(title = "Counts",
direction = "horizontal",
title_position = "topcenter"))
Here is the top of the heatmap (because it is very very long) :
It would be perfect if I could add the levels of the condition for each sample at the very top, where the dendrogram is. I tried :
fcond<-as.factor(colData(vsd_interest)$condition)
# this still gives me the 10 levels, despite not being represented in vsd_interest since it is a subset
pheatmap(assay(vsd_interest)[rownames(assay(vsd_interest)) %in% genes, colnames(assay(vsd_interest)) %in% samples],
cluster_rows=F,
cluster_cols=T,
cellwidth = 10,cellheight=8.6,
color=colorRampPalette(c("#f9cb9c","#a7146a"))(n = 100),
show_colnames=T, # Show the sample names
show_rownames = T,
labels_row=genes_names,
annotation_col = fcond,
heatmap_legend_param = list(title = "Counts",
direction = "horizontal",
title_position = "topcenter"))
Error in `[.default`(annotation_col, colnames(mat), , drop = F) :
incorrect number of dimensions
# So I tried taking the condition levels from my subsetted metadata file instead :
fcond2<-as.factor(sample_data_interest$condition)
# and use this instead of fcond in the previous code, but with the same outcome.
I would really appreciate it if someone could point out the very naive mistake I am making.
I think you very very much for your help. It worked. I think I will have to take off the legend and add it separately though, because I am not finding how to change the position with 'pheatmap' :
F0 is indeed the reference condition in the contrast. Centering each row would mean that I would represent only F6 then ? I like showing the variation between samples of all treatments, but this might just be too messy.
Glad it helps you. You probably can remove the legend and color scale, or if you save the figure in PDF or SVG (I don't know whether SVG is possible), you can move it to a better place. But what is strange is that on the first figure the color scale was really away from the heatmap.
Centring each line on the mean of the F0 samples means that the F0 and F6 values will be represented. The centred F0 values will be close to zero and their fluctuation around zero will give an idea of their noise level. Thus, all F0 samples will show expression levels around zero for each gene. Centred F6 values (relative to the mean of F0) will be more directly readable than when centred on the mean of the F0 and F6 samples.
The code should look like:
Then you know how to draw the heatmap. Just be careful a) to revert to the default color scale which is a dual gradient color scale, and b) to make it centered on zero and strictly symmetric (ie from -3 to +3) by setting
breaks = cent_expr_breaks
in the pheatmap call. Last note, if expr_matrix has rownames (which I advise), there is no need to setlabels_row
.When the expression is centred on the reference group, the colour of the heat map is really the logFC. When centred on the whole line made up of two balanced groups, the color represents only half the logFC (taking zero as the reference).
When scaling (i.e. dividing by sd or mad), the color no longer strictly represents the logFC.
Would you be kind enough to post the F0 centered heatmap as an example of this point of view? Thanks.
He rownames of expr_matrix are the accession numbers only, I feared it would be too messy otherwise, so I kept using labels_row. Green is downregulated compared to the mean of the reference level, yellow close to 0, purple upregulated.
This time the legend is lost somewhere further down the heatmap, which is annoying. Also, I only just noticed that my reference group is on the left and my control on the right, I should find a way to change this.
I ordered the genes by log2FC value, but it maybe I should rather cluster the lines.
rownames(expr_matrix) <- paste(genes,"-",gene_names)
will avoid to pass labels_row and allows to keep custom names synchronized with the data (de-synchronization is something I fear).The words reference and control have a similar meaning, at least to me, so I didn't fully understand your last sentence until I read the heat map. The reference is the condition that is used to calculate the average, in this case F0. The samples from the reference condition are therefore clearly on the right, with colours around ‘yellow’. They were already on the right in the first heat map. The compared condition (sometimes called the test, here condition F6) is on the left. I don't know how to reverse the order on the heatmap, but I wouldn't be interested. As soon as I know how to read the heatmap and/or the conditions legend, I know how to catch the message.
Row clustering will help building gene groups. So that a good option to try.
Nevertheless, I am a bit surprised by the logFC ordering, as I feel that some genes have a low color contrast, e.g. Juvenilie hormone at 2/3 of the list or the FBgn0051926 at 1/4.
As a matter of taste, I am not a fan of the colors you used and I prefer the default ones.
Finally, if you want to have more tuning options, you should have a look at the complexHeatmap package.
I made I mistake I meant "the reference on the right, the group that was compared to it on the left". I think you are right about the ordering, I might have done something wrong. I will probably change the colors yes :) I think the yellow is not bright enough. I am probably going to use the complexHeatmap package in the end, but I likes the simplicity of this one. You mentioned being able to change the position of the legend if I saved as PDF, could you tell me more ? Thanks again !
When you save a plot in a vector format such as SVG or PDF, there will be objects that can be moved around as units. You can literally drag the legend to any part of the plot. If you don't have Adobe Illustrator or Corel Draw, Inkscape is free and can also manipulate vector files.
...I decided to try 'ComplexHeatmap' :
It is easier than I thought it would be, but I am not finding how to use genes_names as row annotations. It is also a bit frustrating that I cannot choose the size of the cells.
I wish there was a way to visualize samples that had 0 counts, because there were quite a lot of them for some genes. I suppose now they appear as the most intense shade of blue.
Please use
ADD REPLY/ADD COMMENT
when responding to existing posts.ADD ANSWER
is only for new answers to original question.