If I have a set of Gene Ontology terms each term having a particular frequency associated with it (the number of the times the term has appeared in fixed corpus of papers), then is the following method of significance testing valid?
- calculate the median absolute deviation (MAD) of the GO term frequencies in the given corpus
- use MAD + median as a threshold above which the GO terms are deemed significantly associated with the given corpus and below which the GO terms are deemed non-siginificant.
Improvements, alternatives, rebuttals?
You have only a single corpus, or several?
Just one: collected using specific MedLine search terms. Problem is I don't know what search terms I would use for my null corpora/corpus.