Heatmap from EdgeR results
2
0
Entering edit mode
6 weeks ago

Hi, I am doing DE miRNA analysis using EdgeR and I need to make a heatmap for the top 50 DE miRNAs, or the most VARIABLE ones.

EdgeR userguide suggests: logcounts <- cpm(y, log=TRUE) where y is the DGEList object.

The problem is with labelling, I want the names of miRNAs to show on the heatmap but " y" object takes no names, only counts matrix, and I'm not sure how to annotate it with the miRNA names. Any help?

edgeR heatmaps DGEList CPM • 189 views
0
Entering edit mode

I cannot follow. With y I guess you mean the DGEList object? Please try to explain better what the problem is, best would be to show code.

0
Entering edit mode

The output from your command should have row names containing the gene names, assuming you provided that information when you made the DGEList. What are you using to make the heatmap? Pretty much any heatmap package will have a parameter to show the row names.

0
Entering edit mode
Count_data <- read.csv("Count_data.csv", check.names=FALSE)

Counts <- Count_data
rownames(Counts) <- Count_data$names counts_IDs <- Count_data Counts_only <- Count_data %>% select(-names) # create the table with only counts here group <- c(rep("SH",4), rep("U2",4), rep("U7",4),rep("RU",4)) y <- DGEList(counts=Counts_only, genes = counts_IDs$names, group=group)

design_edgeR <- model.matrix(~0+group, data=y$samples,genes= Counts$names)
colnames(design_edgeR) <- levels(y$samples$group)

#HEATMAPS:

logcounts <- cpm(y,log=TRUE)

var_genes <- apply(logcounts, 1, var)

# Get the gene names for the top 500 most variable genes
select_var <- names(sort(var_genes, decreasing=TRUE))[1:30]

highly_variable_lcpm <- logcounts[select_var, ]
dim(highly_variable_lcpm)

## Get some nicer colours
library(gplots)
mypalette <- brewer.pal(11,"RdYlBu")
morecols <- colorRampPalette(mypalette)
# Set up colour vector for celltype variable
col.cell <- c("purple","orange")[group]

# Plot the heatmap
heatmap.2(highly_variable_lcpm,col=rev(morecols(50)),trace="none", main="Top variable genes across conditions-Macrophages",ColSideColors=col.cell,scale="row",margins=c(9,9))


The resulting heatmap shows some random numbers as gene names instead of my gene names:

0
Entering edit mode
6 weeks ago
dganiewich ▴ 120

Hi,

Have you tried adding rownames to your matrix as row.names(y)<-gene_names_array?

Best,

Daiana

0
Entering edit mode

Oh that actually worked, thank you very much Daiana!!

0
Entering edit mode
6 weeks ago
Gordon Smyth ★ 2.5k

Just add whatever gene names you want the heatmap to show as row.names of logcounts. See the Section Heatmap clustering of the edgeR QL workflow for a complete worked example. The workflow uses coolmap but the same advice would apply for any heatmap function. By default, the row.names of logcounts will be the gene IDs.

0
Entering edit mode

That's the problem, by default the row names of logcount is not gene ID, it's just numbers 1,2,3....etc. So after few processing steps I no more know what genes are there, I just have a matrix of values.

0
Entering edit mode

The row.names are gene IDs by default. But if you don't supply any row names when the DGEList is created then the IDs will be set to 1, 2, 3 etc.