Question

Microarray Heatmap with DEG's

0

Entering edit mode

4.6 years ago

Alex Gibbs ▴ 80

Hi all!

I have generated 18 DEG files which consist of 23,835 genes each (7x Affected_v_Control, 7x Affected_v_Control2, 4x Control_v_Control2). I haven’t filtered these files for Adj.P or LogFC yet. I have then opened these up in Excel and ordered them by probe ID, then taken the LogFC values and made a new file containing all the LogFC’s for each DEG file.

I then opened this file in Broad Institute’s Morpheus heat map software and performed hierarchical clustering on the columns. This has shown that a particular sample is clustered on its own (which I would like to explore experimentally if true). However, this clustering is based on 23,835 genes. I’m assuming that the majority of these DEGs are not significant. Is there a way that I can filter my file for significance to make a smaller heat map of only significant genes and then re-do the clustering?

I can re-make the file to contain all the DEG LogFC's and Adj.P.Val's then open it in R. But I am not sure how to filter.

Thanks in advance!

Alex

R Microarray Illumina • 2.0k views

ADD COMMENT • link updated 4.6 years ago by Kevin Blighe 87k • written 4.6 years ago by Alex Gibbs ▴ 80

score 2 · Answer 1 · 2019-09-26

Hey Alex,

It is quite easy to do within R, and not to have to go through the laborious route of exporting your data to Excel and then importing it to some silly online heatmap tool.

You should research how to perform data-frame subsetting in R. Your data is probably an ExpressionSet object, but it is still easy.

Here is a completely reproducible example:

1, Download some already-normalised sample data

library(Biobase)
library(GEOquery)
gset <- getGEO("GSE1460", GSEMatrix =TRUE, getGPL=FALSE)[[1]]


class(gset)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"

gset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 15 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM24511 GSM24609 ... GSM24622 (15 total)
  varLabels: title geo_accession ... data_row_count (32 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 15210650 
Annotation: GPL96

2, randomly select 50 probes and pretend that they are the statistically significant ones

sigprobes <- sample(rownames(gset), 50)

With limma output from topTable(), you can filter your object easily like this (where results_table contains the output of topTable()):

results_table_filt <- subset(results_table, abs(logFC) > 2 & adj.P.Val <= 0.05)

...then just take the rownames of results_table_filt (or the first column, whichever has the names), and then use these to subset your expression matrix as I do in part 4 (below). Using this subset() command, I am filtering based on absolute log (base 2) fold-change > 2, and adjusted p-value <= 0.05.

3, transform our expression data by scaling row-wise

heat <- t(scale(t(exprs(gset))))

4, subset our data with the statistically significant probes

heat <- heat[sigprobes,]

This works here because the rownames of heat are probe names

5, generate simple heatmap

heatmap(heat)

Please look up other heatmap functions, such as pheatmap(), heatmap.2(), and Heatmap() (from ComplexHeatmap). There are many tutorials and posts on the Web, including on Biostars.

Kevin