Main Content

**Class: **TreeBagger

Predict response quantile using bag of regression trees

returns a vector of medians of the predicted responses at `YFit`

= quantilePredict(`Mdl`

,`X`

)`X`

, a table or matrix of predictor data, and using the bag of regression trees `Mdl`

. `Mdl`

must be a `TreeBagger`

model object.

uses additional options specified by one or more `YFit`

= quantilePredict(`Mdl`

,`X`

,`Name,Value`

)`Name,Value`

pair arguments. For example, specify quantile probabilities or which trees to include for quantile estimation.

`[`

also returns a sparse matrix of response weights.`YFit`

,`YW`

]
= quantilePredict(___)

`quantilePredict`

estimates the conditional distribution of the response using the training data every time you call it. To predict many quantiles efficiently, or quantiles for many observations efficiently, you should pass `X`

as a matrix or table of observations and specify all quantiles in a vector using the `Quantile`

name-value pair argument. That is, avoid calling `quantilePredict`

within a loop.

`TreeBagger`

grows a random forest of regression trees using the training data. Then, to implement quantile random forest,`quantilePredict`

predicts quantiles using the empirical conditional distribution of the response given an observation from the predictor variables. To obtain the empirical conditional distribution of the response:`quantilePredict`

passes all the training observations in`Mdl.X`

through all the trees in the ensemble, and stores the leaf nodes of which the training observations are members.`quantilePredict`

similarly passes each observation in`X`

through all the trees in the ensemble.For each observation in

`X`

,`quantilePredict`

:Estimates the conditional distribution of the response by computing response weights for each tree.

For observation

*k*in`X`

, aggregates the conditional distributions for the entire ensemble:$$\widehat{F}\left(y|X={x}_{k}\right)={\displaystyle \sum _{j=1}^{n}{\displaystyle \sum _{t=1}^{T}\frac{1}{T}{w}_{tj}\left({x}_{k}\right)}I\left\{{Y}_{j}\le y\right\}}.$$

*n*is the number of training observations (`size(Y,1)`

) and*T*is the number of trees in the ensemble (`Mdl.NumTrees`

).

For observation

*k*in`X`

, the*τ*quantile or, equivalently, the 100*τ*% percentile, is $${Q}_{\tau}({x}_{k})=\mathrm{inf}\left\{y:\widehat{F}\left(y|X={x}_{k}\right)\ge \tau \right\}.$$

This process describes how

`quantilePredict`

uses all specified weights.For all training observations

*j*= 1,...,*n*and all chosen trees*t*= 1,...,*T*,`quantilePredict`

attributes the product*v*=_{tj}*b*_{tj}*w*_{j,obs}to training observation*j*(stored in`Mdl.X(`

and,:)`j`

`Mdl.Y(`

).)`j`

*b*is the number of times observation_{tj}*j*is in the bootstrap sample for tree*t*.*w*_{j,obs}is the observation weight in`Mdl.W(`

.)`j`

For each chosen tree,

`quantilePredict`

identifies the leaves in which each training observation falls. Let*S*(_{t}*x*) be the set of all observations contained in the leaf of tree_{j}*t*of which observation*j*is a member.For each chosen tree,

`quantilePredict`

normalizes all weights within a particular leaf to sum to 1, that is,$${v}_{tj}^{\ast}=\frac{{v}_{tj}}{{\displaystyle \sum _{i\in {S}_{t}({x}_{j})}{v}_{ti}}}.$$

For each training observation and tree,

`quantilePredict`

incorporates tree weights (*w*_{t,tree}) specified by`TreeWeights`

, that is,*w*^{*}_{tj,tree}=*w*_{t,tree}*v*_{tj}^{*}Trees not chosen for prediction have 0 weight.For all test observations

*k*= 1,...,*K*in`X`

and all chosen trees*t*= 1,...,*T*`quantilePredict`

predicts the unique leaves in which the observations fall, and then identifies all training observations within the predicted leaves.`quantilePredict`

attributes the weight*u*such that_{tj}$${u}_{tj}=\{\begin{array}{l}{w}_{tj,\text{tree}}^{\ast};\text{\hspace{0.17em}}\text{if}{x}_{k}\in {S}_{t}({x}_{j})\hfill \\ 0;\text{\hspace{0.17em}}\text{otherwise}\hfill \end{array}.$$

`quantilePredict`

sums the weights over all chosen trees, that is,$${u}_{j}={\displaystyle \sum _{t=1}^{T}{u}_{tj}}.$$

`quantilePredict`

creates response weights by normalizing the weights so that they sum to 1, that is,$${w}_{j}^{\ast}=\frac{{u}_{j}}{{\displaystyle {\sum}_{j=1}^{n}{u}_{j}}}.$$

[1] Breiman, L. "Random Forests." *Machine Learning* 45, pp. 5–32, 2001.

[2] Meinshausen, N. “Quantile Regression Forests.”
*Journal of Machine Learning Research*, Vol. 7, 2006, pp. 983–999.