Machine Learning - Ensemble Methods

Machine Learning
Ensemble Methods
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
July, 2017

Ensemble Methods
• An ensemble method is a combination of multiple and
diverse models.
• Each model in the ensemble makes a prediction.
• A final prediction is determined by a majority vote
among the models.
Model A
Model B
Model C
Input Sample
Each Model receives
the same input
Vote
Each Model outputs its
Prediction to a vote accumulator
ŷ3
ŷ1
ŷ2 ŷf
A final predictor is determined from
a majority vote of the model’s
Predictors.

Background - Condorcet
• The theory behind Ensemble method is based on a
seminal paper written by the French mathematician,
Marquis de Condorcet in 1785.
• In his paper, he proposed a mathematical reasoning
behind majority voting in jury systems on the
probability that a jury will come to the correct decision.
Essay on the Application of Analysis to the Probability of Majority Decisions
https://en.wikipedia.org/wiki/Condorcet%27s_jury_theorem

Condorcet’s Jury Theorm
Principle:
If we assume each voter probability of making a good decision
is better than random (i.e., > 0.50), then the probability of a
good decision increases with each voter added.
He showed the converse was also true. If we assume each voter
probability of making a good decision is less than random
(i.e., < 0.50), then the probability of a good decision decreases
with each voter added.
Example
Even if the probability is slightly more than random (e.g., 0.51),
the principle holds true.
p(0.51) + p(0.51) + p(0.51) … = p(> 0.51)

Weak Learners
• In an Ensemble method, one combines multiple weak learners to
make a strong learning model.
• A weak learner is any model that has an accuracy of better than
random, even if it is just slightly better (e.g., 0.51).
Weak Learner 1
Weak Learner 2
Weak Learner N
…
Majority
Vote
Strong Learner

Ensemble – Decision Stumps
Decision Stumps – Weak Learners
1st Feature
2nd Feature
< 4 >= 4
3rd Feature
weight
width
< 2.5 >= 2.5
height
banana apple
banana apple
apple
<= 4> 4
banana
MAJORITY VOTE
Weight: 4.2 = Apple
Width : 2.3 = Banana
Height : 5.5 = Banana
VOTE = Banana

Bootstrap Aggregation (Bagging)
• Bagging is a method of deriving multiple models from
the same training data, where each model uses a subset
of the training data selected at random.
• A prediction is then made based on a majority vote of
the models.
Training
Data
Random
Subset
Random
Subset
Random
Subset
Random
Subset
Random
Subsets
Random Splitting into Subsets
Models
Models
Models
Models
Models
Trained Weaker Models
Majority
Vote
Models’ Predictions
Stronger Predictor

Random Forrest
• Random Forrest is a popular ensemble method.
• Used for Decision Trees (majority vote) or Regression (mean).
• Good at solving issues of overfitting in Decision Trees.
• Combines Bagging and Splitting of Features.
• Split the training data into B random selected subsets.
• Split the features into K random selected subsets
(e.g., K = sqrt( number of features).
• Produce K models, one per feature subset, per data subset,
for a total of K*B models (e.g., random decision trees).
• Use majority voting (decision tree) or mean (regression) to
predict a result.

Machine Learning - Ensemble Methods

Related slideshows

More Related Content

Machine Learning - Ensemble Methods