I downloaded the raw data for the Agilent-020382 Human Custom Microarray 44k (Feature Number version) platform from GEO, but I do not know how to build the expression matrix step by step. Can you help me? Thank you!
Edit September 5, 2019
NB - this original answer is for 2-colour (channel) Agilent data. Another generic pipeline for 1-colour Agilent is here: A: How to process (seems) Agilent microarrry data?
I presume that you have downloaded the Agilent raw TXT files?
For 2-colour (channel) Agilent Microarrays, the following generic pipeline should allow you to produce a normalised expression matrix and perform a simple differential expression analysis (case-control):
# Set 9 decimal places options(scipen =9 ) require('limma') targetinfo <- readTargets('Targets.txt', sep = '\t')
Targets.txt contains data in the format:
FileName WT_KO SampleFiles/Array1.txt WT SampleFiles/Array2.txt KO SampleFiles/Array3.txt KO SampleFiles/Array4.txt WT
- SampleFiles is a directory in your current working directory
- the 'FileName' column should remain as that (cases sensitive)
WT_KOjust describes one condition of interest (can be any name)
- You can have any number of conditions (as extra columns) in this file
Read in and normalise data
# Converts the data to a RGList (two-colour [red-green] array), with values for R, Rg, G, Gb project <- read.maimages(targetinfo, source = 'agilent') # Perform background correction on the fluorescent intensities project.bgcorrect <- backgroundCorrect(project, method = 'normexp', offset = 16) # Normalize the data with the 'loess' method project.bgcorrect.norm <- normalizeWithinArrays(project.bgcorrect, method = 'loess') # For replicate probes in each sample, replace values with the average project.bgcorrect.norm.avg <- avereps( project.bgcorrect.norm, ID = project.bgcorrect.norm$genes$ProbeName)
# Generate chip images to diagnose spatial artefacts image(project) # box-and-whiskers boxplot( project.bgcorrect.norm.avg, col = "royalblue", las = 2) # PCA p <- prcomp(t(project.bgcorrect.norm.avg), scale = TRUE) # Determine the proportion of variance of each component proportionvariances <- ((p$sdev^2) / (sum(p$sdev^2)))*100 pairs( p$x[,1:5], col = "forestgreen", cex = 0.8, main = "Principal components analysis bi-plot\nPCs 1-5", pch = 16)
# Create the study design design <- model.matrix(~ 0 + factor(targetinfo$WT_KO, levels = c('WT', 'KO'))) colnames(design) <- c('WT', 'KO') # Fit the linear model on the study's data project.fitmodel <- lmFit( project.bgcorrect.norm.avg, design) # Applying the empirical Bayes method to the fitted values # Acts as an extra normalisation step and aims to bring the different probe-wise variances to common values project.fitmodel.eBayes <- eBayes(project.fitmodel) names(project.fitmodel.eBayes) # Make individual contrasts CaseControl <- makeContrasts(CaseControl = 'KO-WT', levels = design) CaseControl.fitmodel <- contrasts.fit(project.fitmodel.eBayes, CaseControl) CaseControl.fitmodel.eBayes <- eBayes(CaseControl.fitmodel) topTable( CaseControl.fitmodel.eBayes, adjust = 'BH', coef = "CaseControl", number = 99999, p.value = 1)