Question: Combining Two Platforms Affy Hgu133A And Hgu133B
1
gravatar for mohan173bmc
6.5 years ago by
mohan173bmc10
mohan173bmc10 wrote:

Hello,

I find myself in a situation where I need to reanalyze an old dataset available at GEO. However, the issue at hand is that the experimental design involves using the same sample for two platforms HGU133a and HGU133b. Is there a way to combine two platforms like this which has used same sample?

I performed preprocessing procedures and mas5 normalization separately for both and extracted the files. I see that 168 probeset ids are common between the two. It is reported in one paper encountering a similar problem that the values for HGU133b was scaled to HGU133a based on the common 100 genes.

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0010696

I am not aware how it is done and whether this a valid approach? Is there any other way of solving the problem? It would also be nice if i get to know the protocol.

many thanks and best regards, mohan

microarray • 3.9k views
ADD COMMENTlink modified 5.0 years ago by oganm60 • written 6.5 years ago by mohan173bmc10
4
gravatar for Sean Davis
6.5 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Normalize the data from the two platforms separately and then combine by just combining the rows from one platform to the other. Since testing for differential expression is done per-gene, there is not a need to have 133a and 133b "combined" in any formal way.

ADD COMMENTlink written 6.5 years ago by Sean Davis25k
2
gravatar for oganm
5.0 years ago by
oganm60
Canada
oganm60 wrote:

Well this is old but still we were able to solve this issue and I'd like to share it so no one else will suffer.

I assume the two chips you use share some probesets and you only want to deal with those shared ones as you can't really compare non existing probes.

You need to tinker with the code of rma function. You can see the code by ctrl + click on the rma function written in your script but I'll try to clear enough here so you won't need to do that.

Also edit: do not load Gdata package beforehand

The critical parts are 1) :

exprs <- .Call("rma_c_complete_copy", pm(object, subset),
      pNList, ngenes, normalize, background, bgversion,
      verbose, PACKAGE = "affy")

that outputs the resulting expressions and  2)

new("ExpressionSet", phenoData = phenoData(object), annotation = annotation(object), 
    protocolData = protocolData(object), experimentData = experimentData(object), 
    exprs = exprs

that creates the resulting object of the function.

In part 1 inputs normalize, background, bgversion and verbose normally comes from the inputs of the original rma function. Fill them as you would normally. To set them to default just use 

verbose = TRUE
destructive = TRUE
normalize = TRUE
bgversion = 2

There are 3 things you need to create manualy: pNList, nGenes and pm(object, subset). Read both groups seperately. I just placed them in different directories 

setwd('HGU133a')

affyA <- ReadAffy()

setwd('..')

setwd('HGU133b')

 

affyB <- ReadAffy()

setwd('..')

You want to get the common probes so do

pNListA = probeNames(affyA)

pNListB = probeNames(affyA)

subsetList = pNListA[pNListA %in% pNListB]

Now you have the subsets so you can request pms of both samples and stitch them together

subsetPmA = pm(affyA, unique(subsetList))

subsetPmB = pm(affyB, unique(subsetList))

 

allPm = cbind(subsetPm, subsetPmOldOrdered) #this will go into the .call function

 

The two variables left for part 1 is simple just do

ngenes = length(unique(subsetList))

pNList = split(0:(length(subsetList) - 1), subsetList)

and run part 1 to get normalized expression values

exprs <- .Call("rma_c_complete", allPm, 
               pNList, ngenes, normalize, background, bgversion, 
               verbose, PACKAGE = "affy")

To create the new object you need to stitch the components of the two objects together. For annotation, if one of your chips has a subset of the probes in the other probe, just use that one, but I don't think it matters that much, you have what you need at this point. Just don't mix up the order when you are using combine

phenoD = combine(phenoData(affyA), phenoData(affyB))
annot =  annotation(affyA)
protocolD = combine(protocolData(affyA), protocolData(affyB))
experimentD = experimentData(affyA)

newNormalized = new("ExpressionSet", phenoData = phenoD, annotation = annot, 
    protocolData = protocolD, experimentData = experimentD, 
    exprs = exprs)

That's it. you now have your handcrafted rma output. Use it as you normally would.

 

 

 

ADD COMMENTlink modified 4.9 years ago • written 5.0 years ago by oganm60

thanks for sharing 

ADD REPLYlink written 5.0 years ago by Istvan Albert ♦♦ 80k

Or you can just use this package:

http://bmbolstad.com/misc/mixtureCDF/MixtureCDF.html

I haven't tried it yet but it does the same thing.

--edit--

it does not do the same thing. it only has a template for certain chip versions apparently.

also do not load gdata package as it hides combine method from affy which is not nice.

ADD REPLYlink modified 4.9 years ago • written 5.0 years ago by oganm60

Dear Oganm Could you explain the steps of Combining Two Platforms Affy Hgu133A And Hgu133B without the code I have the same problem I want to use GSE9006 and it contains (Hgu133A and Hgu133B )

ADD REPLYlink written 18 months ago by lur_murad0
1
gravatar for Istvan Albert
6.5 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

I suspect that there is no standardized way to do this - but if you have already found a published method I would go with that - and cite it.

In the end it all depends on whether the genes used for normalization do actually represent genes that don't change across the experiments. That's probably the critical element.

ADD COMMENTlink written 6.5 years ago by Istvan Albert ♦♦ 80k
1
gravatar for fanofactor
6.5 years ago by
fanofactor30
fanofactor30 wrote:

The package matchprobes for R (I am not sure it is maintained) has a function (combineAffyBatch) to combine different chips. It combines the probes by sequence, not by id.

ADD COMMENTlink written 6.5 years ago by fanofactor30
0
gravatar for mohan173bmc
6.5 years ago by
mohan173bmc10
mohan173bmc10 wrote:

many thanks for your reply sean, istvan and fanofactor.

  1. I have no idea how the scaling from hgu133b to hgu133a was performed. So i cannot follow it. Methods suggested if any are welcome. I hope these genes do not change across experiments (btw. the dataset is a cancer tissue sample in a clinical cohort)
  2. Although combining two files by rows is a good idea to get differentially regulated genes, in order to perform a coexpression assay it is not suitable i guess. I wish that the final combined file is also fit for a metaanalysis across two different clinical cohorts.
  3. I will surely check if "combineaffy" can be used.

thank you all again.

best, mohan

ADD COMMENTlink written 6.5 years ago by mohan173bmc10
1

With regard to #3, combineAffyBatch is for combining arrays that share content. Hgu133a and hgu133b do not share content, so this will be equivalent to combining rows.

ADD REPLYlink written 6.5 years ago by Sean Davis25k

With regard to #2, note that correlation is scale-free. Try this experiment:

> dat = seq(0,10,1) + rnorm(99)
> dat2 = seq(0,10,1) + rnorm(99)
> cor(dat,dat2)
[1] 0.8928766
> cor(dat,dat2*20)
[1] 0.8928766

That being the case, scaling the rows will not change a "co-expression" analysis that relies on correlation (and most do).

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 905 users visited in the last hour