Question: Testing Significance Of Go Term Frequency In Biomedical Literature
gravatar for user1409015
7.6 years ago by
user140901520 wrote:

If I have a set of Gene Ontology terms each term having a particular frequency associated with it (the number of the times the term has appeared in fixed corpus of papers), then is the following method of significance testing valid?

  1. calculate the median absolute deviation (MAD) of the GO term frequencies in the given corpus
  2. use MAD + median as a threshold above which the GO terms are deemed significantly associated with the given corpus and below which the GO terms are deemed non-siginificant.

Improvements, alternatives, rebuttals?

go • 1.5k views
ADD COMMENTlink modified 7.6 years ago by seidel6.9k • written 7.6 years ago by user140901520

You have only a single corpus, or several?

ADD REPLYlink written 7.6 years ago by Sean Davis25k

Just one: collected using specific MedLine search terms. Problem is I don't know what search terms I would use for my null corpora/corpus.

ADD REPLYlink written 7.6 years ago by user140901520
gravatar for seidel
7.6 years ago by
United States
seidel6.9k wrote:

"...which the GO terms are deemed significantly associated with the given corpus" With only one corpus, you're trying to figure out how to hear the sound of only one hand clapping. The significance has to be defined relative to some reference. It seems to me from your comment to Sean, that you are trying to establish a relationship between medline search terms (used to return your corpus) and some set of GO terms found in that corpus. Perhaps more information about the search terms, or what kind of association you're trying to make would be helpful. (i.e. what is the point of your particular corpus, relative to a randomly chosen corpus of the same size that also contains some GO Terms?)

ADD COMMENTlink written 7.6 years ago by seidel6.9k

A search term such as "autophagy" excluding reviews. So, basically what would I use as a control corpus for that? How do I randomly select n MedLine articles? And, from there how can I compare the GO term composition of my corpus of interest with the corpus/corpora of randomly-selected papers?

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by user140901520

I thing the answer to "what is the control corpus" depends on the purpose for wanting to link autophagy to a pile of GO terms through medline. You might explain more about what you are trying to accomplish through query search term -> GO Term linkage that needs significance attached. How many query terms do you have? (consider that each one, given the process you described, generates a frequency vector for all GO Terms). What do they have in common? (this may be relevant in terms of what constitutes a control). Depending on what you're trying to do, perhaps the problem can be re-phrased: given a list of query terms, define a corpus for each one, this will generate x test sets. As a background reference, consider generating random corpora by using a universe defined by the set of journals contained across your test corpora. From this journal set, find a way to select articles randomly (i.e. journal title, year, perhaps some other property - pick articles randomly from the return list). This would establish a way to generate a background frequency for the GO Terms found in your test set. You can then compare frequencies of terms between your background universe and your test set. One could think of many variations on this theme - it gets back to what you're really trying to accomplish.

ADD REPLYlink written 7.6 years ago by seidel6.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2234 users visited in the last hour