Question: Hierarchical Clustering in single-channel agilent microarray experiment
1
gravatar for Leite
21 months ago by
Leite410
Leite410 wrote:

Hello everyone,

I'm trying to put a Hierarchical Clustering and boxplot in this code but always from error, what would be the best package to do this?

#Load limma
library(limma)

#Set-up
targetinfo <- readTargets("Targets.txt",row.names="FileName",sep="")

#Read files
project <- read.maimages(targetinfo,source="agilent", green.only=TRUE)

#Background correction
project.bgc <- backgroundCorrect(project, method="normexp", offset=16)

#Normalize the data with the 'quantile' method for 1-color
project.NormData <-normalizeBetweenArrays(project.bgc,method="quantile")

# load colour libraries
library(RColorBrewer)
# set colour palette
cols <- brewer.pal(8, "Set1")

#Histogram of non-normalized
plotDensities(project.bgc, col=cols, legend=FALSE)

#Histogram of normalized
plotDensities(project.NormData, col=cols, legend=FALSE)

#Create the study design and comparison model
design <- paste(targetinfo$Target, sep="")
design <- factor(design)
comparisonmodel <- model.matrix(~0+design)
colnames(comparisonmodel) <- levels(design)
#Checking the experimental design
design
comparisonmodel

project.fit <- lmFit(project.NormData, comparisonmodel)
project.fit <- lmFit(project.NormData,comparisonmodel)

#Applying the empirical Bayes method
project.fit.eBayes <- eBayes(project.fit)
names(project.fit.eBayes)

#Make individual contrasts and fit the new model
CaseControl <- makeContrasts(CaseControl="D0A-Control", levels=comparisonmodel)
CaseControl.fitmodel <- contrasts.fit(project.fit.eBayes, CaseControl)
CaseControl.fitmodel.eBayes <- eBayes(CaseControl.fitmodel)

#Filtering Results
 nrow(topTable(CaseControl.fitmodel.eBayes, coef="CaseControl", number=99999, lfc=2))
 probeset.list <- topTable(CaseControl.fitmodel.eBayes, "CaseControl", number=99999, adjust.method="BH", sort.by="P", lfc=2)

#To save results
write.table(probeset.list, "results.txt", sep="\t", quote=FALSE)

Best regards,

Leite

ADD COMMENTlink modified 21 months ago by Kevin Blighe47k • written 21 months ago by Leite410
1

Hi Leite,

I assume your question is about microarray data, but you never specified that. I adapted your title to clarify this.

but always from error

It's very hard for us to figure out what's going wrong if you don't show us the error message.

Cheers,
Wouter

ADD REPLYlink written 21 months ago by WouterDeCoster40k

Hey WouterDeCoster,

Sorry for the lack of information, you're correct, its about microarray data analysis.

I've tryed this code for bloxplot:

boxplot(project.NormData, col=cols)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
'x' must be atomic

And this code for Hierarchical Clustering:

plot(project.NormData)
Error in as.matrix(E$E) : object 'E' not found

Thank you so much,

Leite

ADD REPLYlink modified 21 months ago • written 21 months ago by Leite410
9
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe47k
Kevin Blighe47k wrote:

The expression values for a 2- or single-colour Agilent array are stored in the 'M' or 'E' variable, i.e., project.NormData$E ( C: Single-color Agilent array analyzing in R )

For each of the following functions, I encourage you to devote a full day to understanding what each and every parameter is doing. That is the best way for you to learn.

-------------------------------------------

Box-and-whisker plot

par(mar=c(8,8,5,5), cex=1.0, cex.axis=1.4, cex.lab=1.4)
boxplot(project.NormData$E,
  main="Box-and-whisker plot",
  xlab="", ylab=bquote(~Log[2]~expression),
  names=paste("Sample", c(1:ncol(project.NormData$E))),
  col="skyblue",
  las=2,
  outline=FALSE)

box


Violin plot (Wouter likes violin plots - maybe he plays a violin)

require(reshape2)
violinMatrix <- reshape2::melt(project.NormData$E)
colnames(violinMatrix) <- c("Gene","Sample","Expression")

library(ggplot2)
ggplot(violinMatrix, aes(x=Sample, y=Expression)) +
  geom_violin() +
  theme(axis.text.x = element_text(angle=45, hjust=1))

violin


Hierarchical clustering (unsupervised on entire dataset - very CPU and memory intensive)

For a simple dendrogram or circular dendrogram, take a look at my threads:

dend


heatmap.2 (hierarchical clustering dendrogram with heatmap)

For the heatmaps, you usually want to filter your expression matrix for genes that are differentially expressed. You appear to have just fitered out probes that are greater than absolute log (base 2) fold change 2, stored in your probeset.list object

#Filter the expression matrix to include only differentially expressed genes
sigmatrix <- project.NormData$E[probeset.list,]

#Scale the filtered expression matrix (convert to Z scale)
heat <- t(scale(t(sigmatrix)))

#Set colour
require(RColorBrewer)
myCol <- colorRampPalette(c("violet", "black", "springgreen"))(100)
myBreaks <- seq(-3, 3, length.out=101)

require("gplots")

#Euclidean distance; Ward's linkage
par(mar=c(1,1,1,1), cex=1.0)
heatmap.2(heat,
  col=myCol,
  breaks=myBreaks,
  main="",
  key=T, key.xlab="Expresssion\nZ-score", keysize=1.0,
  scale="none",
  ColSideColors=condition,
  density.info="none",
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean), 
  trace="none",
  cexRow=1.0, cexCol=1.0,
  distfun=function(x) dist(x, method="euclidean"),
  hclustfun=function(x) hclust(x, method="ward.D2"),
  margins=c(6, 6))
legend("top",
  bty="n",
  cex=1.0,
  title="Condition",
  c("Wild-type", "Knock-out"), fill=c("yellow", "royalblue"),
  horiz=TRUE)

#1 - Pearson correlation distance; Ward's linkage
par(mar=c(1,1,1,1), cex=1.0)
heatmap.2(heat,
  col=myCol,
  breaks=myBreaks,
  main="",
  key=T, key.xlab="Expresssion\nZ-score", keysize=1.0,
  scale="none",
  ColSideColors=condition,
  density.info="none",
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  trace="none",
  cexRow=1.0, cexCol=1.0,
  distfun=function(x) as.dist(1-cor(t(x))),
  hclustfun=function(x) hclust(x, method="ward.D2"),
  margins=c(6, 6))
legend("top",
  bty="n",
  cex=1.0,
  title="Condition",
  c("Wild-type", "Knock-out"), fill=c("yellow", "royalblue"),
  horiz=TRUE)

heatmap

ComplexHeatmap

For ComplexHeatmap, see my recent post here: C: how to cluster genes in heatmap

I have also posted code in a comment following this answer.

ADD COMMENTlink modified 8 weeks ago • written 21 months ago by Kevin Blighe47k
1

ComplexHeatmap

  ColAnn <- data.frame(condition)
  colnames(ColAnn) <- c("Condition")
  ColAnn <- HeatmapAnnotation(df=ColAnn,
    which="col",
    col=list(Condition=c("Wild-type"="yellow", "Knock-out"="royalblue")))

  #Boxplots for rows (genes) and columns (samples)
  boxAnnCol <- HeatmapAnnotation(
    boxplot=anno_boxplot(heat,
      border=FALSE,
      gp=gpar(fill="#CCCCCC"),
      lim=NULL,
      pch=".",
      size=unit(2, "mm"),
      axis=FALSE,
      axis_side=NULL,
      axis_gp=gpar(fontsize=12)),
    annotation_width=unit(c(1, 7.5), "cm"))
  boxAnnRow <- rowAnnotation(
    boxplot=row_anno_boxplot(heat,
      border=FALSE,
      gp=gpar(fill="#CCCCCC"),
      lim=NULL,
      pch=".",
      size=unit(3, "cm"),
      axis=FALSE,
      axis_side="top",
      axis_gp=gpar(fontsize=12)),
    annotation_width=unit(c(3), "cm"))

  myCol <- colorRampPalette(c("violet", "black", "springgreen"))(100)
  myBreaks <- seq(-3, 3, length.out=100)

  hmap <- Heatmap(heat,
            name="Expression Z-score",
            col=colorRamp2(myBreaks, myCol),
            heatmap_legend_param=list(
              color_bar="continuous",
              legend_direction="horizontal",
              legend_width=unit(5,"cm"),
              title_position="topcenter",
              title_gp=gpar(fontsize=15, fontface="bold")),

            #Row annotation configurations
            cluster_rows=TRUE,
            show_row_dend=TRUE,
            row_title="Gene",
            row_title_side="left",
            row_title_gp=gpar(fontsize=16, fontface="bold"),
            row_title_rot=90,
            show_row_names=TRUE,
            row_names_side="left",
            row_names_gp=gpar(fontsize=10),

            #Column annotation configuratiions
            cluster_columns=TRUE,
            show_column_dend=TRUE,
            column_title="Samples",
            column_title_side="top",
            column_title_gp=gpar(fontsize=16, fontface="bold"),
            column_title_rot=0,
            show_column_names=TRUE,
            column_names_gp=gpar(fontsize=14),

            #Dendrogram configurations: columns
            clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
            clustering_method_columns="ward.D2",
            column_dend_height=unit(30,"mm"),

            #Dendrogram configurations: rows
            clustering_distance_rows=function(x) as.dist(1-cor(t(x))),
            clustering_method_rows="ward.D2",
            row_dend_width=unit(30,"mm"),

            #Annotations (row annotation must be added with 'draw' function, below)
            top_annotation_height=unit(0.5,"cm"),
            top_annotation=ColAnn,

            bottom_annotation_height=unit(3, "cm"),
            bottom_annotation=boxAnnCol)

    draw(hmap + boxAnnRow, heatmap_legend_side="left", annotation_legend_side="right")

complexheatmap

ADD REPLYlink modified 5 months ago • written 21 months ago by Kevin Blighe47k
1
gravatar for Hussain Ather
21 months ago by
Hussain Ather940
National Institutes of Health, Bethesda, MD
Hussain Ather940 wrote:

Check out clustermap

ADD COMMENTlink written 21 months ago by Hussain Ather940
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1096 users visited in the last hour