Normalization Methods To Apply On A Data.Frame Object
2
0
Entering edit mode
11.2 years ago
Curious Mind ▴ 10

Hi,

There are several methods to normalize data in the form of affyBatch objects. Some of these methods are: threestep, mas5calls, mascallsfilter, justMAS and rma.

Nevertheless, my data is in the data.frame format, as I have read my expression data from a .txt file. Can you please let me know what normalization and filtration methods can I use on a data.frame? Or is it possible to convert data.frame into an affyBatch object?

When I tried some of the normalization methods, I got the following error:

> dat.eset <- threestep(dat.fp,background.method="RMA.2",normalize.method="quantile",summary.method="median.polish")
Error in threestep(dat, background.method = "RMA.2", normalize.method = "quantile",  : 
  argument is data.frame threestep requires AffyBatch

> dat.mas5 <- mas5calls(dat)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘mas5calls’ for signature ‘"data.frame"’

Thanks

r bioconductor • 9.7k views
ADD COMMENT
1
Entering edit mode

Are you absolutely sure your data comes from an Affymetrix platform? Affymetrix files usually comes in a binary .CEL format.

Secondary, if the file IS from an Affymetrix platform, and it's in .txt, there is a good chance that it has been normalized already, because of above mentioned binary file.

Edit: If you're in doubt, post the first few header lines from your file. You could also make a density plot of the intensities, and it'll show whether the data has been normalized or not.

ADD REPLY
0
Entering edit mode

This from GEO database. As per the authors, it is from Affymetrix platform. ID GSM801843 GSM801844 GSM801845 GSM801846 GSM801847 GSM801848 GSM801849 GSM801850 GSM801851 GSM801852 GSM801853 GSM801854 GSM801855 GSM801856 GSM801857 GSM801858 GSM801859 GSM801860 GSM801861 NM_014543.2_psr1_at 7.78415 7.63683 7.09851 7.41493 6.87848 7.22712 6.99564 5.86747 5.83964 6.61278 7.52737 7.97955 6.6788 7.50651 7.23592 5.37349 6.28702 6.46063 6.30963

ADD REPLY
1
Entering edit mode

It looks to me like it's already normalized and in log2 values. Try doing a density plot, and you should see all distributions are within very close proximity of each other.

ADD REPLY
0
Entering edit mode

Thank you so much. This explains why I saw few variations.

ADD REPLY
0
Entering edit mode

FYI, the CEL files are availabe for those via GEO and you can process them however you like.

ADD REPLY
0
Entering edit mode

Just to be nitpicking: Not all authors provide the raw data files (most do, though!). Some authors deposit only the normalized data files, which is quite annoying.

ADD REPLY
0
Entering edit mode

I did say "...for those" :) Yeah, it's always annoying to have deal with some randomly processed series matrix file!

ADD REPLY
0
Entering edit mode

How were the values in the text file processed? BTW, you can always just manually create an affyBatch object, though it doesn't look completely trivial.

ADD REPLY
0
Entering edit mode

I am reading the text file as shown below: dat<-read.table("C:\Data\EstrogenSampleData.txt", header=T,row.names=1) Thanks

ADD REPLY
0
Entering edit mode

My question isn't how the data was read into R, but rather how it was processed to create the text file. Are these processed intensities or are they pulled directly from the CEL files or what? The way to process them will depend on how you got the numbers you currently have.

ADD REPLY
0
Entering edit mode

Got it from GEO database as "Dataset SOFT file". The authors don't provide additional data. I want to test some publicly available datasets and understand the workflow. Thanks

ADD REPLY
1
Entering edit mode

The data in the SOFT files has already been processed (there's no standard way). BTW, you can use the GEOquery package and have it fetch the series matrix file for you. The workflow for that's a bit more straight-forward.

ADD REPLY
0
Entering edit mode
library(GEOquery)
# get the ExpressionSet, usually normalized already
eset = getGEO("GSE32394")[[1]]
# get the .CEL files
getGEOSuppFiles("GSE32394")
ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
11.2 years ago
Michael 55k

That is absolutely possible, at least last time I checked the normalization functions in affy are wrappers around internal functions which finally reduce to functions working on matrices. It is possible to dig out these internal functions and use them, even though it might not be recommended. You can try to dig in the affy source code. I did this once, if you want I can try to find it for you.

ADD COMMENT
0
Entering edit mode
11.1 years ago
polarise ▴ 380

The limma package has several normalisation functions that can work on common R data structures. You can use either normalizeBetweenArrays() or normalizeWithinArrays(). The affy package does have a function normalize.quantiles() but it seems to be inaccessible directly (booooring!).

ADD COMMENT

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6