Hello all,
I have asked this question here a few weeks ago, but as I had provided only a limited amount of information about my dataset, I got a semi-useful answer. Here is my question, in more detail.
I have the result of an agilent microarray assay, possibly a 2 channel microarray, in Excel format. This is an example of how the dataset looks:
CLID NAME GWEIGHT Pat1A Pat1B Pat2A Pat2B
AGI_HUM1_OLIGO_A_23_P100001 1 0.331 1.144 -1.165 -0.952
AGI_HUM1_OLIGO_A_23_P100011 1 -0.254 -0.068 -0.091 0.511
AGI_HUM1_OLIGO_A_23_P10002 1
AGI_HUM1_OLIGO_A_23_P100022 1 3.503 2.595 3.612 3.776
AGI_HUM1_OLIGO_A_23_P100033 1
AGI_HUM1_OLIGO_A_23_P100056 1 0.565 0.102 1.449 1.718
AGI_HUM1_OLIGO_A_23_P100059 1
AGI_HUM1_OLIGO_A_23_P100065 1
AGI_HUM1_OLIGO_A_23_P100074 1 -0.236 -0.219 0.709 0.792
The experiment is whole genome analysis of paired human samples. The only other piece of information I have is that data has been log-transformed and normalized.
I have been planing to use the Bioconductor package to perform the analysis, but I am at a loss as to how to go about doing this. Both the LIMMA and the AGILP packages use the output for the Red and Green channels from the Agilent Feature Exrtaction software as inputs, as far as I can tell, and I don't have them.
- I am reading this data as a composite of the output from the Red and green channels for each sample. Is that correct? Or is it a one-channel array?
- I am still assuming that using the Bioconductor package is the right way to go, but how do I enter the data into R such that it can be utilized by one of the functions from the package? Or if there is another package to be used, can you suggest that?
- Another problem I have been having is the annotation of the probes. I am not sure where to get the annotation data. I have looked at the annotation package in Bioconductor, but I am again not sure which data to use.
- A fourth problem I am having is that, as a beginning, I have read the data into base R as a text file and been able to view it as a data frame. I have tried to perform a row-wise paired t-test using the
multtest
command in the genefilter package, but that ended up crashing R every time. I want to use thesapply
function from theplyr
package. Any ideas how to do it? I have tried to use a do loop to perform the t-test on each row of observations, but I ran into problems when some of the row observations were NAs. - As an aside, I have been able to identify this kind of data file as a pre-clustered file, which only adds to my confusion. I have no idea what that kind of a file means or what information it conveys, or how it should be analyzed or what packages to use for it.
I know it is a long post and a lot of questions but I have tried to provide as detailed a question as possible so that I can get some useful answers. I would really appreciate everybody who took the time to provide me some answers to these questions. If some useful literature and reading material can be suggested, I would be grateful for that too.
Cheers,
Krishna