First, sorry if my English is not good!
many terms have not enough terms associated
I presume that you want to say many [genes] have not... There is 2 things to take into account:
GO uses the True Path Rule, that is to say, if a gene is annotated by a term, it is also implicitly annotated by all the parents of this term, up to the root. Making this extension is crucial in term of inference (Seung Yon Rhee, Valerie Wood, Kara Dolinski, and Sorin Draghici. Use and misuse of the gene ontology annotations. Nature Reviews Genetics, 9(7):509-515, 2008, http://bio.lmu.de/~parsch/evogen/GOreview2008.pdf).
All species and all metabolisms are not equal in term of annotation, the more a gene is studied, the more annotations it got.
there are some duplications
I asked GO for this, they answer me that each duplicated annotation has a different Evidence Code. It shows various level of study. So if you use GO to do some semantic enrichment or inference, think to delete all doubles. But if you are interested in Evidence Codes, doubles may serve you.
Evidence Codes represent a delicate point in Gene Ontology. I cite GO documentation:
"Evidence codes are not statements of the quality of the annotation. Within each evidence code classification, some methods produce annotations of higher confidence or greater specificity than other methods, in addition the way in which a technique has been applied or interpreted in a paper will also affect the quality of the resulting annotation. Thus evidence codes cannot be used as a measure of the quality of the annotation."
So it bring an information, but it may not serve to quantify the quality of an annotation. It is a matter of higher confidence or greater specificity... The nuance is subtle.
I really recommend to read Rhee's article that I cite before for a better use of GO.