5 months ago by

University College London Cancer Institute

Hello and good afternoon / evening / morning,

I presume that you have downloaded the Agilent raw TXT files?

For **2-colour Agilent microarrays**, the following generic pipeline should allow you to produce a normalised expression matrix and then conduct a simple diff. expression analysis (case-control):

# Set-up

```
source("http://bioconductor.org/biocLite.R")
#Set 999 decimal places
options(scipen=999)
require("limma")
targetinfo <- readTargets("Targets.txt", sep="\t")
```

*Targets.txt* contains data in the format:

FileName WT_KO

SampleFiles/Array1.txt WT

SampleFiles/Array2.txt KO

SampleFiles/Array3.txt KO

SampleFiles/Array4.txt WT

- SampleFiles is a directory in your current working directory
- the 'FileName' column should remain as that (cases sensitive)
- WT_KO just describes one condition of interest (can be any name)
- You can have any number of conditions (as extra columns) in this file

# Read in and normalise data

```
#Converts the data to a RGList (two-colour [red-green] array), with values for R, Rg, G, Gb
project <- read.maimages(targetinfo, source="agilent")
#Perform background correction on the fluorescent intensities
#'normexp' is beneficial in that it doesn't result in negative values, meaning no data is lost
project.bgcorrect <- backgroundCorrect(project, method="normexp", offset=16)
#Normalize the data with the 'loess' method
#LOESS performs local regression on subsets of the data, resulting in the generation of a 'regression curve' through it
project.bgcorrect.norm <- normalizeWithinArrays(project.bgcorrect, method="loess")
#For replicate probes in each sample, replace values with the average
project.bgcorrect.norm.avg <- avereps(project.bgcorrect.norm, ID=project.bgcorrect.norm$genes$ProbeName)
#Write out the expression values (using filename as headers)
temp <- project.bgcorrect.norm.avg$M
rownames(temp) <- project.bgcorrect.norm.avg$genes$GeneName
temp <- temp[order(rownames(temp), decreasing=FALSE),]
write.table(temp, "Results/MasterExpression.tsv", sep="\t", quote=FALSE, append=FALSE)
#Summarise over gene names (averaging counts over transcripts)
temp <- data.frame(cbind(data.frame(project.bgcorrect.norm.avg$genes$GeneName), data.frame(project.bgcorrect.norm.avg$M)))
project.bgcorrect.norm.avg.summarised <- aggregate(temp[,2:ncol(temp)], temp[1], mean)
rownames(project.bgcorrect.norm.avg.summarised) <- project.bgcorrect.norm.avg.summarised[,1]
project.bgcorrect.norm.avg.summarised <- project.bgcorrect.norm.avg.summarised[,2:ncol(project.bgcorrect.norm.avg.summarised)]
write.table(project.bgcorrect.norm.avg.summarised, "Results/MasterExpressionAverages.tsv", sep="\t", quote=FALSE, append=FALSE)
```

From this, you will have 2 files in your Results directory:

- MasterExpression.tsv - normalised counts per target probe
- MasterExpressionAverages.tsv - normalised counts summarised (by mean)
over genes

# QC

```
#Generate chip images to diagnose spatial artefacts for each array and a merged boxplot of intensities
image(project)
boxplot(project.bgcorrect.norm.avg.summarised, main="Background-corrected, RMA normalised, log2 expression values\nAll probes", col="royalblue", las=2)
#Principal components analysis
project.pca <- prcomp(t(project.bgcorrect.norm.avg.summarised), scale=TRUE)
#Determine the proportion of variance of each component
project.pca.proportionvariances <- ((project.pca$sdev^2) / (sum(project.pca$sdev^2)))*100
pairs(project.pca$x[,1:5], col="forestgreen", cex=0.8, main="Principal components analysis bi-plot\nPCs 1-5", pch=16)
```

# Differential expression

```
#Create the study design and comparison model
design <- paste(targetinfo$WT_KO, sep="")
design <- factor(design)
comparisonmodel <- model.matrix(~0+design)
colnames(comparisonmodel) <- levels(design)
#Fit the linear model on the study's data
project.fitmodel <- lmFit(project.bgcorrect.norm.avg.summarised, comparisonmodel)
#Applying the empirical Bayes method to the fitted values acts as an extra normalization step and aims to bring the different probe-wise variances to common values.
project.fitmodel.eBayes <- eBayes(project.fitmodel)
names(project.fitmodel.eBayes)
#Make individual contrasts and fit the new model
CaseControl <- makeContrasts(CaseControl="KO-WT", levels=comparisonmodel)
CaseControl.fitmodel <- contrasts.fit(project.fitmodel.eBayes, CaseControl)
CaseControl.fitmodel.eBayes <- eBayes(CaseControl.fitmodel)
topTable(CaseControl.fitmodel.eBayes, adjust="fdr", coef="CaseControl", number=99999, p.value=1)
```