Question

Heatmap for differential gene expressions

1

Entering edit mode

6.5 years ago

Sharon ▴ 600

I am trying to do some clustering. I need to get heatmap plot for edgeR gene expressions. I usually get the following error, although I ensured data1 is matrix. Any hint? Thanks

Error in heatmap.2(data1, col = col.pan, Rowv = TRUE, scale = "none", : `x' must be a numeric matrix

et <- exactTest(dge, pair=c("ctrl", "tr"))
etp <- topTags(et, n=2000000)
data1 <- as.matrix(etp$table$logFC)
heatmap.2(data1, col=col.pan, Rowv=TRUE, scale="none",trace="none", dendrogram="both", cexRow=1, cexCol=1.4, density.info="none",margin=c(10,9), lhei=c(2,10), lwid=c(2,6))

edgeR Clustering heatmap • 4.7k views

ADD COMMENT • link updated 6.5 years ago by jaro.slamecka ▴ 240 • written 6.5 years ago by Sharon ▴ 600

0

Entering edit mode

Hi Sharon, how are you?

In your code above, data1 will just consist of a single vector of log base 2 fold changes, and you will not be able to generate a heatmap with that.

What exactly are you aiming to do?

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, I think you are right. I am trying to plot my genes expressions using heatmap. But still if I call with etp$table for all the matrix, it gives the same error.

ADD REPLY • link 6.5 years ago by Sharon ▴ 600

0

Entering edit mode

I believe that the 'table' object of etp is just P values, fold changes, and other stats.

I believe that you have to do something like this:

significantGenes <- rownames(with(etp$table, subset(abs(logFC)>2 & FDR<0.05)))

heatmap.2(data.matrix(ExprMatrix[significantGenes,]), ...)

The first line extracts gene names (assuming gene names are rownames of etp$table) that have absolute log2 FC > 2 and adjusted P value <0.05. The second line subsets your expression matrix (assuming it's called *ExprMatrix*) and concurrently performs the heatmap.2 function.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry for this noise. But is not ExprMatrix just my etp?

ADD REPLY • link 6.5 years ago by Sharon ▴ 600

1

Entering edit mode

Hey Sharon,

Your original expression matrix will be whatever you passed as the 'counts' argument to DGEList(). For example, DGEList(counts=counts, group=1:2)

Much as I am aware, etp just contains information on the differential expression analysis.

names( etp$table )

[1] "logConc" "logFC" "p.value"

The object produced by exactTest contains three elements: table, comparison and genes. The element de.com$comparison contains a vector giving the names of the two groups compared. The table de.com$table contains the elements logConc, which gives the overall concentration for a tag across the two groups being compared, logFC, which gives the log-fold change difference for the counts between the groups and p.value gives the exact p-values computed.

[source: page 23/24 of https://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf]

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Finally works, thanks Kevin !

ADD REPLY • link 6.5 years ago by Sharon ▴ 600

1

Entering edit mode

Great - good luck with the rest of the project

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

score 0 · Answer 1 · 2017-11-08

0

Entering edit mode

6.5 years ago

jaro.slamecka ▴ 240

Hi Sharon, you're calling the heatmap.2 function on a vector of fold changes (etp$table$logFC). You should call it on a matrix of normalized expression values in dge$E. Before that, subset the dge object based on the most interesting top tags from etp.

ADD COMMENT • link 6.5 years ago by jaro.slamecka ▴ 240

0

Entering edit mode

Hi jaro,

So you think I call it with etp from the above code itself? it still gives the same error. or you mean I create a separate matrix of all the gene names and the logfc only? I will play around and see. hopefully.

ADD REPLY • link 6.5 years ago by Sharon ▴ 600

0

Entering edit mode

Your etp object only contains ranked genes based on the evidence of differential expression. So you can't use it because it no longer has the actual expression values. But you need it to first decide which genes you want to plot onto the heatmap by taking them out of etp first by doing something like the line below, in principle. It does the same as Kevin Blighe's first line above, it subsets the etp table to only keep genes with logFC greater and lower than 2 and p.value below 0.02 (these cutoffs you might have to adjust to keep, say, a few hundred genes) and then takes out the rownames which should be some kind of gene IDs (like ensembl_gene_id, depending on how you did your annotation), these you'll need in the next step.

diff.genes = rownames(etp$table[abs(etp$table$logFC)>2 & etp$table$p.value<0.02, ])

Then, based on diff.genes, you'll have to subset your dge object because that's the one that has the normalized expression values:

dge.subset = dge[diff.genes, ]

The expression matrix that you'll pass to heatmap.2 is then: dge.subset$E

heatmap.2(dge.subset$E...

If you get errors, you can do this and paste the output here:

rownames(etp$table)[1:10]
colnames(etp$table)
rownames(dge)[1:10]

(BTW, one little detail, you can't create a "matrix" of gene names and logFC because values in a matrix have to be of the same type, you could create a data.frame though:))