Hi everyone, Protein localization (subcellular localization) is an active field of research. In previously published papers on this problem, people exclude the homologous proteins from data sets. I was wondering if any body knows the reason of excluding homologous proteins?! Is it for preventing the bias of precision/recall or preventing from employing the homology information?! Thank you
It depends on the steps (training or testing) that exclusion procedure were used. However, basically, it is "for preventing the bias of precision/recall or preventing from employing the homology information”. In addition, for preventing a predictor to become biased to proteins which belong to large family (many similar proteins are included in the training dataset).