How to annotate only selected genes on a heatmap
4.3 years ago

Hello, all. Do someone happen to know how to annotate only selected genes in a heatmap?

I use heatmap.2 in gplots package and can annotate all genes in the data. It is okay as far as the genes are not so many. But when I plot, say, 100 genes, the annotations can't be read. So I want to annotate only selected genes, like Fig3. A on this article .

My guess is that these annotations are added manually. But there might be a package in R for it, which I am not aware of yet.

can you try this one (with links): data(mtcars) x <- as.matrix(mtcars) labels = rownames(x)[c(1,4,5)] Heatmap(x, show_row_names = FALSE, show_row_dend = FALSE, show_column_dend = FALSE) + rowAnnotation(link = row_anno_link(at = c(1,4,5), labels = labels), width = unit(1, "cm") + max_text_width(labels))

4.3 years ago
poisonAlien ★ 3.1k

ComplexHeatmap can do this. I believe article uses the same.

library(ComplexHeatmap)
data(mtcars)
x  <- as.matrix(mtcars)
labels = rownames(x)[c(1,4,5)]
Heatmap(x[,c(9:11)], show_row_names = FALSE, show_row_dend = FALSE, show_column_dend = F, show_heatmap_legend =F) + rowAnnotation(link = row_anno_link(at = c(1,4,5), labels = labels), width = unit(1, "cm") + max_text_width(labels))

4.3 years ago
BioBing ▴ 130

Hi h.fushimi.x689,

When you write annotate genes, do you mean transcripts? (if so, there is a risk that you will have multiple transcripts that are encoded by the same gene)

This is a quick and dirty method that I have used to get a "quick" overview of my transcripts:

1) Extract the IDs for the transcripts represented in your heatmap (this is an example where the first column is the ID) and write them into a txt file (in R): IDs <- as.data.frame(df[,1]) write.table(IDs, file="IDs.txt", row.names=FALSE, col.names=FALSE, quote=FALSE, sep",")

2) In the terminal (if you have not already done it), remove the "description" from your fasta file - and keep only the ID's (Trinity IDs in this case) sed -e 's/^$$>[^[:space:]]*$$.*/\1/' my.fasta > mymodified.fasta

3) Extract sequences based on the ID's extracted from R:

sudo pip install pyfaidx

xargs faidx input.fasta < IDs.txt > output.fasta

4) Load the output fasta into the free version of Blast2Go: https://www.blast2go.com/

5) Copy paste the Transcript IDs and the annotation to each of them into a data frame (I used excel) and save it as csv (for example annotation.csv)

6) Load the annotation.csv into R and merge it with the data-frame containing your heatmap data (annotations = annotation.csv):

annotation <- merge(annotations, tmp_df, by="target_id")

#Set the target_id's as row names - Description in this case is the annotation description obtained from blast2go
rownames(annotation) <- annotation\$Description

anno<-annotation[,-c(1:2)] # delete column 1 & 2 containing target id's and description not used as row_names

#Heatmap2
hp <- hclust(as.dist(1-cor(t(anno), method="pearson")), method="complete")

hs <- hclust(as.dist(1-cor(anno, method="pearson")), method="complete")

#Make a 6x8 inch image at 600dpi:
ppi <- 600
png("myheatmap.png", width=10*ppi, height=6*ppi, res=ppi)
heatmap.2(as.matrix(anno),
Rowv=as.dendrogram(hp),
Colv=as.dendrogram(hs),
scale="row",
density.info="none",
trace="none",
cexRow=0.7, cexCol = 0.8,
col=bluered(75),
margins = c(6,27),
keysize=1,
key.par = list(cex=0.7),
dendrogram="column")
dev.off()


Cheers, B

Hi BioBing, this looks like a quite complex workflow for what OP wants to achieve. I am not doubting that it works, even though it is not possible to reproduce, but could you explain what it actually is at the core that displays only a selection of gene ids (e.g. setting them to NULL or empty string) and how this can be achieved inside R without using external programs.

Hi Michael,

Yes, I agree - my approach is kinda messy :-) But it was how I got it to work, and I just wanted to share it in a case of it could be helpful in some way.

Ahh! I just realized I misunderstood the question, my fault. I thought the question was about getting annotations into a heatmap of selected genes without a full annotation!

I am not sure how to add in only few gene names to a heatmap in R. Maybe adding on the gene names manually in photoshop or similar?

Thanks BioBing. I think I should have not used the word "annotate". As I understand, this workflow is how to annotate FASTA file. What I want is how to show selected, not all, gene names , all of them are already "annotated", on a heatmap.

4.3 years ago

The easy way: use Row-labels and set all but those you want to show to empty string:

data(mtcars)
x  <- as.matrix(mtcars)
labRow <- c(row.names(x)[1], rep('', length(row.names(x))-1)) # take just the first name, here you can choose the ones you like
heatmap.2(x, labRow = labRow)


Result: a heatmap with only Mazda RX4 :)

lastline:

heatmap.2(x, labRow = labRow)

fixed, thank you

Thanks Michael. This is almost I want, and works well in many cases. If possible, I want to draw leading lines with some scripts. Because 1) with many genes, i.e. very narrow rows, it is somewhat difficult to precisely identify, or draw a line by hand to the row which the label annotates. 2) in some cases, selected genes locate closely and the labels overlap.

Sorry, drawing lines is a bit more difficult. How about something like this:

 heatmap.2(x, labRow = labRow, rowsep = c(9:10), sepwidth = c(0.05,0.05), sepcolor = 'blue')


You just have to find your gene of interest in the dendrogram order.