Hi Dears… In a meta-analysis study, I am trying to find DEGs from different individual studies with platforms like 3D-Gene Human Oligo chip 25k V2.1, Affymetrix Human Genome U133 Plus 2.0 Array, Affymetrix Human Genome U133 Plus 2.0 Array, Agilent-014850 Whole Human Genome Microarray 4x44K G4112F and Agilent-014850 Whole Human Genome Microarray 4x44K G4112F. I would be very happy if anyone could help me to find the possibility of analysis and if it is possible which program would be more efficient?
If the condition under study is the same in each experiment, process each experiment independently and then perform a meta-analysis on the results. A problem that you'll face is that the packages needed to process the different array types will be different.
If the studies using Affymetrix Human Genome U133 Plus 2.0 Array are the exact same experimental set-up, you could feasibly just combine these and normalise them together, in which case you may still want to adjust for batch effects after (if you don't know what I am talking about, then ignore this) - same for the studies using Agilent-014850 Whole Human Genome Microarray 4x44K G4112F
Note that 3D-Gene Human Oligo chip 25k V2.1 is a lower density array compared to the others, so, there will not be extensive overlap between it and the others, i.e., different genes will be targeted.
For the Agilent arrays, a general pipeline to read in and normalise:
library("limma") #Read in the data into a dataframe #readTargets will by default look for the 'FileName' column in the spcified file targetinfo <- readTargets("Targets.txt", sep="\t") #Converts the data to a RGList (two-colour [red-green] array), with values for R, Rg, G, Gb project <- read.maimages(targetinfo, source="agilent") #Perform background correction on the fluorescent intensities project.bgcorrect <- backgroundCorrect(project, method="normexp", offset=16) #Normalize the data with the 'loess' method #LOESS performs local regression on subsets of the data, resulting in the generation of a 'regression curve' through it project.bgcorrect.norm <- normalizeWithinArrays(project.bgcorrect, method="loess") #For replicate probes in each sample, replace values with the average #ID is used to identify the replicates project.bgcorrect.norm.avg <- avereps(project.bgcorrect.norm, ID=project.bgcorrect.norm$genes$ProbeName)
The file, Targets.txt, may look like:
FileName WT_KO Time SampleFiles/251486810768_GE2-v5_95_Feb07_1_1.txt WT 4Wk_TAC SampleFiles/251486810768_GE2-v5_95_Feb07_1_2.txt KO 4Dy_Rev SampleFiles/251486810768_GE2-v5_95_Feb07_1_3.txt KO 7Dy_Rev SampleFiles/251486810942_GE2-v5_95_Feb07_1_1.txt WT 4Wk_TAC SampleFiles/251486810942_GE2-v5_95_Feb07_1_2.txt WT 4Dy_Rev
For the Affymetrix arrays:
library("oligo") #Read in the data into a dataframe targetinfo <- readTargets("Targets.txt", sep="\t") CELFiles <- list.celfiles("SampleFiles/", full.names = TRUE) #Raw intensity data project <- read.celfiles(CELFiles) #Background correct, normalize, and calculate gene expression project.bgcorrect.norm.avg <- rma(project, background=TRUE, normalize=TRUE, target="core")
FileName SampleID Group SampleFiles/1_CS0911a_(HuGene-2_0-st).CEL CS0911a KN92 SampleFiles/10_CS0812d_(HuGene-2_0-st).CEL CS0812d KN92_WNT3A SampleFiles/11_CS0812e_(HuGene-2_0-st).CEL CS0812e KN93_WNT3A
After you have normalised the data, in each case, you can perform differential expression analysis using the limma package - again, this will be performed independently in each study.
It is your role to learn what each step above is doing, and it is your role to learn how to use limma to perform differential expression analysis. You can also learn how to annotate your data with gene names (which will be required, possibly using biomaRt) and to perform the end meta-analysis, if that is your aim.
What I have written here is a rough guide to help you get started.