Hi Sean, Frymor and Davy,
I am a newbie to microarray data analysis. I've got a data set of microarrays to work on. I was able to run the rma normalization and LIMMA analysis.
Now I have a big table of over 25K genes. If I understood it correctly most of them are not differentially regulated.
The question is of course, how to identify the genes which are significant?
I was using the example offered here to plot my data and got the same curve and lines of p-values. Am I wrong to understand that this plot is the same as the histogram of the p-values distribution which is easily done by various examples and shows a something like this: histogram - image was taken from http://www.tcrt.org
But, now that I have it, what do I do with it?
I know that the plot shows me the distribution of the p-values over my experiments. The person I am working with told me I need to look for the point, where the curve goes flat-lined, but this is as arbitrary as just picking a p-value by chance.
His reason for that choice was - everybody is doing so.
This is what I understood after reading some papers -
- In a normal experiment most of the genes won't be differentially
- the adj. p-value is my multiple testing hypothesis correction value.
- the higher this value is, the more false positive I get in my list of DE genes.
But all these doesn't help me to find the right threshold. I read this paper: Estimating p-values in small microarray experiments.
Here they try to explain why permutations is a good idea with small data sets (which I also have - four replicates for three different conditions each).
But still not a clue about the 'right' value
> You are not going to get an answer from this group on the question of what the right number is because there is no one-size-fits-all number to use. If you are unclear about how to interpret your results, I suggest you find a local collaborator who can work with you on your data.
I can understand Sean's saying it is difficult to get an answer to such a question, as there is no straight or direct answer for that. Each experiment is a single, unique data set. But I am sure there is some kind of method to define the right p-value by looking at the distribution of the data.
AN explanation might be that I am to choose the point where the curve is getting flatter for the reason that at this point I will have the highest number of significantly differentially regulated genes with the smaller number of false positive in this data.
Is this an explanation which can stand?
Thanks for the help?