Hi, I have a time course (0h,24h,48h,72h,96h,144h after sexual stage induction) microarray datasets about Gibberella zeae, a plant pathogen, in which about 14000 coding protein sequences,after analyzing microarray with SAS proc mixed procedure, I find about 5000 genes differetially expressed in total of these time course, is it normal? I really hope somebody can give me some suggestion. Thanks.
Last year a paper suggested that nearly all genes are transcriptionally regulated during plant infection.
I think this might actually be the case for all organisms. When something happen the whole transcriptome is slightly regulated. Some genes have drammatic change, the other simply "adjust" to the new "state".
The fact is that, usually, you can show that only a few genes are regulated because to pass a statistical test you need either a big shift in mean expression value or many many replicates. And given the cost of microarray, the latter is rarely possible, so you end up "seeing" only those that have big swings in gene expression. Furthermore you need to correct for multiple testing, and to make sure you don't have too many false positive, you end up having many false negatives.
The above mentioned paper had 72 (!) biological replicates because it was the collection of all "controls" of a massive experiment and so their statistics is very powerful.
If you have many replicates and/or the biological replicates are very homogeneous, you might find many genes that result regulated.
There's no metric for a 'normal' amount of genes differentially expressed in a microarray experiment, this is going to vary massively depending on your experimental conditions. I've seen very well replicated experiments that have 1000's of differentially expressed genes detectable in a very robust fashion, other very targeted experiments (siRNA knockdowns) in which only a handful of genes are perturbed.
Given that you're reporting a number of genes differentially expressed 'in total of the time course' maybe you should be looking at changes between the timepoints as well as across the whole experiment?
The real issue is that dissecting a gene list 5000 genes long to get any more meaningful information is a bit more of a challenge than dissecting one 500 genes long.