Hi,
There are several methods to normalize data in the form of affyBatch objects. Some of these methods are: threestep, mas5calls, mascallsfilter, justMAS and rma.
Nevertheless, my data is in the data.frame format, as I have read my expression data from a .txt file. Can you please let me know what normalization and filtration methods can I use on a data.frame? Or is it possible to convert data.frame into an affyBatch object?
When I tried some of the normalization methods, I got the following error:
> dat.eset <- threestep(dat.fp,background.method="RMA.2",normalize.method="quantile",summary.method="median.polish")
Error in threestep(dat, background.method = "RMA.2", normalize.method = "quantile", :
argument is data.frame threestep requires AffyBatch
> dat.mas5 <- mas5calls(dat)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘mas5calls’ for signature ‘"data.frame"’
Thanks
Are you absolutely sure your data comes from an Affymetrix platform? Affymetrix files usually comes in a binary .CEL format.
Secondary, if the file IS from an Affymetrix platform, and it's in .txt, there is a good chance that it has been normalized already, because of above mentioned binary file.
Edit: If you're in doubt, post the first few header lines from your file. You could also make a density plot of the intensities, and it'll show whether the data has been normalized or not.
This from GEO database. As per the authors, it is from Affymetrix platform. ID GSM801843 GSM801844 GSM801845 GSM801846 GSM801847 GSM801848 GSM801849 GSM801850 GSM801851 GSM801852 GSM801853 GSM801854 GSM801855 GSM801856 GSM801857 GSM801858 GSM801859 GSM801860 GSM801861 NM_014543.2_psr1_at 7.78415 7.63683 7.09851 7.41493 6.87848 7.22712 6.99564 5.86747 5.83964 6.61278 7.52737 7.97955 6.6788 7.50651 7.23592 5.37349 6.28702 6.46063 6.30963
It looks to me like it's already normalized and in log2 values. Try doing a density plot, and you should see all distributions are within very close proximity of each other.
Thank you so much. This explains why I saw few variations.
FYI, the CEL files are availabe for those via GEO and you can process them however you like.
Just to be nitpicking: Not all authors provide the raw data files (most do, though!). Some authors deposit only the normalized data files, which is quite annoying.
I did say "...for those" :) Yeah, it's always annoying to have deal with some randomly processed series matrix file!
How were the values in the text file processed? BTW, you can always just manually create an affyBatch object, though it doesn't look completely trivial.
I am reading the text file as shown below: dat<-read.table("C:\Data\EstrogenSampleData.txt", header=T,row.names=1) Thanks
My question isn't how the data was read into R, but rather how it was processed to create the text file. Are these processed intensities or are they pulled directly from the CEL files or what? The way to process them will depend on how you got the numbers you currently have.
Got it from GEO database as "Dataset SOFT file". The authors don't provide additional data. I want to test some publicly available datasets and understand the workflow. Thanks
The data in the SOFT files has already been processed (there's no standard way). BTW, you can use the GEOquery package and have it fetch the series matrix file for you. The workflow for that's a bit more straight-forward.
Cross-posted: http://stackoverflow.com/q/18230978/1274516