Random Forests
A random forest is an ensemble of decision trees that are independently grown using a random subset of the input variables at each split. The predictions of the individual trees are then combined to give the final prediction. For regression problems, this is typically done by averaging, while for classification problems, it is typically done by majority vote.
The variance of a Random Forest with B trees can be derived as:
Var(B) = [(1 - ρ)Var(T) + ρσ^2]/B
where Var(T) is the variance of a single tree, ρ is the correlation between any two trees in the ensemble, σ^2 is the variance of a single tree's output, and B is the number of trees. This shows that the variance (and hence the error) decreases as the number of trees increases, and as the correlation between trees decreases.
In addition to these, there are more advanced tree-based methods such as Gradient Boosting and XGBoost, which use decision trees in a boosting framework to create powerful predictive models.
Updated 5 months ago