Combining Two Platforms Affy Hgu133A And Hgu133B
4
1
Entering edit mode
8.4 years ago
mohan173bmc ▴ 10

Hello,

I find myself in a situation where I need to reanalyze an old dataset available at GEO. However, the issue at hand is that the experimental design involves using the same sample for two platforms HGU133a and HGU133b. Is there a way to combine two platforms like this which has used same sample?

I performed preprocessing procedures and mas5 normalization separately for both and extracted the files. I see that 168 probeset ids are common between the two. It is reported in one paper encountering a similar problem that the values for HGU133b was scaled to HGU133a based on the common 100 genes.

I am not aware how it is done and whether this a valid approach? Is there any other way of solving the problem? It would also be nice if i get to know the protocol.

many thanks and best regards, mohan

microarray • 4.9k views
0
Entering edit mode

1. I have no idea how the scaling from hgu133b to hgu133a was performed. So i cannot follow it. Methods suggested if any are welcome. I hope these genes do not change across experiments (btw. the dataset is a cancer tissue sample in a clinical cohort)
2. Although combining two files by rows is a good idea to get differentially regulated genes, in order to perform a coexpression assay it is not suitable i guess. I wish that the final combined file is also fit for a metaanalysis across two different clinical cohorts.
3. I will surely check if "combineaffy" can be used.

thank you all again.

best,
mohan

1
Entering edit mode

With regard to #3, combineAffyBatch is for combining arrays that share content. Hgu133a and hgu133b do not share content, so this will be equivalent to combining rows.

0
Entering edit mode

With regard to #2, note that correlation is scale-free. Try this experiment:

> dat = seq(0,10,1) + rnorm(99)
> dat2 = seq(0,10,1) + rnorm(99)
> cor(dat,dat2)
[1] 0.8928766
> cor(dat,dat2*20)
[1] 0.8928766


That being the case, scaling the rows will not change a "co-expression" analysis that relies on correlation (and most do).

5
Entering edit mode
8.4 years ago

Normalize the data from the two platforms separately and then combine by just combining the rows from one platform to the other. Since testing for differential expression is done per-gene, there is not a need to have 133a and 133b "combined" in any formal way.

2
Entering edit mode
6.9 years ago
oganm ▴ 60

Well this is old but still we were able to solve this issue and I'd like to share it so no one else will suffer.

I assume the two chips you use share some probesets and you only want to deal with those shared ones as you can't really compare non existing probes.

You need to tinker with the code of rma function. You can see the code by ctrl + click on the rma function written in your script but I'll try to clear enough here so you won't need to do that.

Also edit: do not load Gdata package beforehand

The critical parts are 1) :

exprs <- .Call("rma_c_complete_copy", pm(object, subset),
pNList, ngenes, normalize, background, bgversion,
verbose, PACKAGE = "affy")


that outputs the resulting expressions and 2)

new("ExpressionSet", phenoData = phenoData(object), annotation = annotation(object),
protocolData = protocolData(object), experimentData = experimentData(object),
exprs = exprs


that creates the resulting object of the function.

In part 1 inputs normalize, background, bgversion and verbose normally comes from the inputs of the original rma function. Fill them as you would normally. To set them to default just use

verbose = TRUE
destructive = TRUE
normalize = TRUE
bgversion = 2


There are 3 things you need to create manualy: pNList, nGenes and pm(object, subset). Read both groups seperately. I just placed them in different directories

setwd('HGU133a')
setwd('..')
setwd('HGU133b')
setwd('..')


You want to get the common probes so do

pNListA = probeNames(affyA)
pNListB = probeNames(affyA)
subsetList = pNListA[pNListA %in% pNListB]


Now you have the subsets so you can request pms of both samples and stitch them together

subsetPmA = pm(affyA, unique(subsetList))
subsetPmB = pm(affyB, unique(subsetList))
allPm = cbind(subsetPm, subsetPmOldOrdered) #this will go into the .call function


The two variables left for part 1 is simple just do

ngenes = length(unique(subsetList))
pNList = split(0:(length(subsetList) - 1), subsetList)


and run part 1 to get normalized expression values

exprs <- .Call("rma_c_complete", allPm,
pNList, ngenes, normalize, background, bgversion,
verbose, PACKAGE = "affy")


To create the new object you need to stitch the components of the two objects together. For annotation, if one of your chips has a subset of the probes in the other probe, just use that one, but I don't think it matters that much, you have what you need at this point. Just don't mix up the order when you are using combine.

phenoD = combine(phenoData(affyA), phenoData(affyB))
annot =  annotation(affyA)
protocolD = combine(protocolData(affyA), protocolData(affyB))
experimentD = experimentData(affyA)

newNormalized = new("ExpressionSet", phenoData = phenoD, annotation = annot,
protocolData = protocolD, experimentData = experimentD,
exprs = exprs)


That's it. you now have your handcrafted rma output. Use it as you normally would.

0
Entering edit mode

thanks for sharing

0
Entering edit mode

Or you can just use this package:

I haven't tried it yet but it does the same thing.

--edit--

it does not do the same thing. it only has a template for certain chip versions apparently.

also do not load gdata package as it hides combine method from affy which is not nice.

0
Entering edit mode

Dear Oganm Could you explain the steps of Combining Two Platforms Affy Hgu133A And Hgu133B without the code I have the same problem I want to use GSE9006 and it contains (Hgu133A and Hgu133B )

1
Entering edit mode
8.4 years ago

I suspect that there is no standardized way to do this - but if you have already found a published method I would go with that - and cite it.

In the end it all depends on whether the genes used for normalization do actually represent genes that don't change across the experiments. That's probably the critical element.

1
Entering edit mode
8.4 years ago
fanofactor ▴ 30

The package matchprobes for R (I am not sure it is maintained) has a function (combineAffyBatch) to combine different chips. It combines the probes by sequence, not by id.