Question

Reason of excluding homolog protein in protein subcellular location prediction

0

Entering edit mode

7.0 years ago

ruhi.jamali • 0

Hi everyone, Protein localization (subcellular localization) is an active field of research. In previously published papers on this problem, people exclude the homologous proteins from data sets. I was wondering if any body knows the reason of excluding homologous proteins?! Is it for preventing the bias of precision/recall or preventing from employing the homology information?! Thank you

subcellular prediction homology protein • 1.2k views

ADD COMMENT • link updated 7.0 years ago by fishgolden ▴ 510 • written 7.0 years ago by ruhi.jamali • 0

score 1 · Accepted Answer · 2017-05-14

It depends on the steps (training or testing) that exclusion procedure were used. However, basically, it is "for preventing the bias of precision/recall or preventing from employing the homology information”. In addition, for preventing a predictor to become biased to proteins which belong to large family (many similar proteins are included in the training dataset).