Question

Normalization Methods To Apply On A Data.Frame Object

0

Entering edit mode

10.7 years ago

Curious Mind ▴ 10

Hi,

There are several methods to normalize data in the form of affyBatch objects. Some of these methods are: threestep, mas5calls, mascallsfilter, justMAS and rma.

Nevertheless, my data is in the data.frame format, as I have read my expression data from a .txt file. Can you please let me know what normalization and filtration methods can I use on a data.frame? Or is it possible to convert data.frame into an affyBatch object?

When I tried some of the normalization methods, I got the following error:

> dat.eset <- threestep(dat.fp,background.method="RMA.2",normalize.method="quantile",summary.method="median.polish")
Error in threestep(dat, background.method = "RMA.2", normalize.method = "quantile",  : 
  argument is data.frame threestep requires AffyBatch

> dat.mas5 <- mas5calls(dat)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘mas5calls’ for signature ‘"data.frame"’

Thanks

r bioconductor • 9.4k views

ADD COMMENT • link updated 10.6 years ago by polarise ▴ 380 • written 10.7 years ago by Curious Mind ▴ 10

1

Entering edit mode

Are you absolutely sure your data comes from an Affymetrix platform? Affymetrix files usually comes in a binary .CEL format.

Secondary, if the file IS from an Affymetrix platform, and it's in .txt, there is a good chance that it has been normalized already, because of above mentioned binary file.

Edit: If you're in doubt, post the first few header lines from your file. You could also make a density plot of the intensities, and it'll show whether the data has been normalized or not.

ADD REPLY • link 10.7 years ago by David Westergaard ★ 1.5k

0

Entering edit mode

This from GEO database. As per the authors, it is from Affymetrix platform. ID GSM801843 GSM801844 GSM801845 GSM801846 GSM801847 GSM801848 GSM801849 GSM801850 GSM801851 GSM801852 GSM801853 GSM801854 GSM801855 GSM801856 GSM801857 GSM801858 GSM801859 GSM801860 GSM801861 NM_014543.2_psr1_at 7.78415 7.63683 7.09851 7.41493 6.87848 7.22712 6.99564 5.86747 5.83964 6.61278 7.52737 7.97955 6.6788 7.50651 7.23592 5.37349 6.28702 6.46063 6.30963

ADD REPLY • link 10.7 years ago by Curious Mind ▴ 10

1

Entering edit mode

It looks to me like it's already normalized and in log2 values. Try doing a density plot, and you should see all distributions are within very close proximity of each other.

ADD REPLY • link 10.7 years ago by David Westergaard ★ 1.5k

0

Entering edit mode

Thank you so much. This explains why I saw few variations.

ADD REPLY • link 10.7 years ago by Curious Mind ▴ 10

0

Entering edit mode

FYI, the CEL files are availabe for those via GEO and you can process them however you like.

ADD REPLY • link 10.7 years ago by Devon Ryan 104k

0

Entering edit mode

Just to be nitpicking: Not all authors provide the raw data files (most do, though!). Some authors deposit only the normalized data files, which is quite annoying.

ADD REPLY • link 10.7 years ago by David Westergaard ★ 1.5k

0

Entering edit mode

I did say "...for those" :) Yeah, it's always annoying to have deal with some randomly processed series matrix file!

ADD REPLY • link 10.7 years ago by Devon Ryan 104k

0

Entering edit mode

How were the values in the text file processed? BTW, you can always just manually create an affyBatch object, though it doesn't look completely trivial.

ADD REPLY • link 10.7 years ago by Devon Ryan 104k

0

Entering edit mode

I am reading the text file as shown below: dat<-read.table("C:\Data\EstrogenSampleData.txt", header=T,row.names=1) Thanks

ADD REPLY • link 10.7 years ago by Curious Mind ▴ 10

0

Entering edit mode

My question isn't how the data was read into R, but rather how it was processed to create the text file. Are these processed intensities or are they pulled directly from the CEL files or what? The way to process them will depend on how you got the numbers you currently have.

ADD REPLY • link 10.7 years ago by Devon Ryan 104k

0

Entering edit mode

Got it from GEO database as "Dataset SOFT file". The authors don't provide additional data. I want to test some publicly available datasets and understand the workflow. Thanks

ADD REPLY • link 10.7 years ago by Curious Mind ▴ 10

1

Entering edit mode

The data in the SOFT files has already been processed (there's no standard way). BTW, you can use the GEOquery package and have it fetch the series matrix file for you. The workflow for that's a bit more straight-forward.

ADD REPLY • link 10.7 years ago by Devon Ryan 104k

0

Entering edit mode

library(GEOquery)
# get the ExpressionSet, usually normalized already
eset = getGEO("GSE32394")[[1]]
# get the .CEL files
getGEOSuppFiles("GSE32394")

ADD REPLY • link 10.7 years ago by Sean Davis 26k

0

Entering edit mode

Cross-posted: http://stackoverflow.com/q/18230978/1274516

ADD REPLY • link 10.7 years ago by Ben ★ 2.0k

score 0 · Answer 1 · 2013-08-14

That is absolutely possible, at least last time I checked the normalization functions in affy are wrappers around internal functions which finally reduce to functions working on matrices. It is possible to dig out these internal functions and use them, even though it might not be recommended. You can try to dig in the affy source code. I did this once, if you want I can try to find it for you.

score 0 · Answer 2 · 2013-09-10

0

Entering edit mode

10.6 years ago

polarise ▴ 380

The limma package has several normalisation functions that can work on common R data structures. You can use either normalizeBetweenArrays() or normalizeWithinArrays(). The affy package does have a function normalize.quantiles() but it seems to be inaccessible directly (booooring!).

ADD COMMENT • link 10.6 years ago by polarise ▴ 380