I am analyzing microarray data using the Bioconductor limma package in R. The experiment is a direct two-color experiment comparing two genotypes. There are two probes for each gene and I would like to analyze the ratio of the probe signals. I am using the standard workflow of first fitting a linear model and then applying the Bayesian hierarchical modeling:
fit <- lmFit(normalizedValues, designMatrix)
fit <- eBayes(fit)
topTable(fit)
So far, I have calculated the ratios of the two probes from the raw data for each replicate and then fed it into the analysis pipeline. However, this does not reflect the variance associated with the raw values that were used to calculate the ratio. My guess is that I am underestimating the variance. Is there a way to accurately perform such an analysis using the limma package? I can calculate an estimate of the variance using Fieller's theorem but I don't know how to feed that into the pipeline without seriously tinkering with the limma functions.
Any suggestions?
Thanks for your help!
Are these duplicate probes (also called "duplicate spots") or do they have completely different underlying sequence? In the latter case, I'd be a bit hesitant to try to combine probes into a single variance estimation, since they're often not actually measuring the same thing (be that due to differences in isoform targeting, differences is cross-hybridization, etc.).
The probes measure different things, in my case one is an intron probe the other a splice junction probe. Good point on the different properties of the two probes and the potential problems with calculating a single variance estimate. Are there any alternatives if I want to make use of both sources of information (intron & junction)? Thanks!
At least to my knowledge there's nothing premade for this sort of task, though I've never really looked. These are somewhat strange probes to try to integrate. I expect that the intronic probes will generally show higher variance, since they're picking up unprocessed mRNA, lariats, circular RNAs, and genomic contamination, all of which are likely more variable than the mature mRNA picked up by the junction probes. I would have thought that one would use an array with this sort of design to try to get at changes in RNA processing due to some sort of treatment, though perhaps you have other goals.
I am using this array to analyze difference in RNA splicing between genotypes. My thinking was that if splicing is compromised, the intron signal would go up (more pre-mRNA) and the junction signal would go down (less mature mRNA). This is where the ratio comes from (or in log space, the difference). I could score all splicing events that individually show significantly increased intron probe signals and significantly decreased junction probe signals.
However, I noticed that there are splicing events that have highly significantly increased intron probe signals but unchanged junction probe signals, so I would have to throw them out. This is why I am looking for a score that integrates the two pieces of information.
Thanks for your help and comments! Greatly appreciated!
It might make sense to just split the the probe types as if they were replicates and then incorporate probe-type into your model matrix with a genotype interaction (so ~genotype*probe_type). That would allow you to find inconsistent changes in the probes.