I have a question about KOG database from NCBI. In the 2003 COG paper, they mentioned the total number of clusters of KOG is 4852, which is the same as the kog file on ncbi ftp site, but the total number of sequences is 110,655, which is different from the site; but later there is another paper only for KOG, where the cluster number changes to 5873, which is weird.
Also, there is a paper about different orthology detection method assessment from orthomcl's team. If I understand correctly, they later compare orthomcl with KOG directly. But in their description, they mentioned the total number of clusters in KOG is 10,058(in table 3 of that paper), which is so strange. Does anyone know the stories or history behind it?
The papers are from 2003, 2004 and 2007. I would imagine that the differences are due to different versions of the resource used in the different papers.
On NCBI website, they only published one version of KOG, and the date shows 2003. In that 2007 orthomcl paper, they also mention the total number of sequences, which is the same as the published data, but the cluster number is twice as the data and the number in the paper.