I recently tried deFuse and raised several questions after reading the paper.
If I understand correctly, two necessary conditions set on discordant reads are hard filtering to get rid of false fusion events. Later, the split reads are used to estimate the break point and fragment length. P-value is then calculated based on fragment length normal distribution. this value is then used alone with other parameters as features for adaboost classifier to determine the true / false event.
My questions are:
1) The hard filter of two discordant reads based conditions is used primarily to get rid of false fusion events resulted from homologs (the source of most false fusion event I think), right?
2) The later calculated features (those used in the adaboost) reflect factors that may affect the validity of fusion event, right?
3) Why adaboost is used as classifier instead of, for example, random forest? Any particular consideration?
4) Some of my colleague think that this method is conservative / prone to specificity and may miss some true fusion event. Can you share something on this concern?
I would be really grateful if anyone can share some opinions. Thank in advance.