SlideShare a Scribd company logo
Machine Learning
Ensemble Methods
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
July, 2017
Ensemble Methods
• An ensemble method is a combination of multiple and
diverse models.
• Each model in the ensemble makes a prediction.
• A final prediction is determined by a majority vote
among the models.
Model A
Model B
Model C
Input Sample
Each Model receives
the same input
Vote
Each Model outputs its
Prediction to a vote accumulator
ŷ3
ŷ1
ŷ2 ŷf
A final predictor is determined from
a majority vote of the model’s
Predictors.
Background - Condorcet
• The theory behind Ensemble method is based on a
seminal paper written by the French mathematician,
Marquis de Condorcet in 1785.
• In his paper, he proposed a mathematical reasoning
behind majority voting in jury systems on the
probability that a jury will come to the correct decision.
Essay on the Application of Analysis to the Probability of Majority Decisions
https://en.wikipedia.org/wiki/Condorcet%27s_jury_theorem
Condorcet’s Jury Theorm
Principle:
If we assume each voter probability of making a good decision
is better than random (i.e., > 0.50), then the probability of a
good decision increases with each voter added.
He showed the converse was also true. If we assume each voter
probability of making a good decision is less than random
(i.e., < 0.50), then the probability of a good decision decreases
with each voter added.
Example
Even if the probability is slightly more than random (e.g., 0.51),
the principle holds true.
p(0.51) + p(0.51) + p(0.51) … = p(> 0.51)
Weak Learners
• In an Ensemble method, one combines multiple weak learners to
make a strong learning model.
• A weak learner is any model that has an accuracy of better than
random, even if it is just slightly better (e.g., 0.51).
Weak Learner 1
Weak Learner 2
Weak Learner N
…
Majority
Vote
Strong Learner
Ensemble – Decision Stumps
Decision Stumps – Weak Learners
1st Feature
2nd Feature
< 4 >= 4
3rd Feature
weight
width
< 2.5 >= 2.5
height
banana apple
banana apple
apple
<= 4> 4
banana
MAJORITY VOTE
Weight: 4.2 = Apple
Width : 2.3 = Banana
Height : 5.5 = Banana
VOTE = Banana
Bootstrap Aggregation (Bagging)
• Bagging is a method of deriving multiple models from
the same training data, where each model uses a subset
of the training data selected at random.
• A prediction is then made based on a majority vote of
the models.
Training
Data
Random
Subset
Random
Subset
Random
Subset
Random
Subset
Random
Subsets
Random Splitting into Subsets
Models
Models
Models
Models
Models
Trained Weaker Models
Majority
Vote
Models’ Predictions
Stronger Predictor
Random Forrest
• Random Forrest is a popular ensemble method.
• Used for Decision Trees (majority vote) or Regression (mean).
• Good at solving issues of overfitting in Decision Trees.
• Combines Bagging and Splitting of Features.
• Split the training data into B random selected subsets.
• Split the features into K random selected subsets
(e.g., K = sqrt( number of features).
• Produce K models, one per feature subset, per data subset,
for a total of K*B models (e.g., random decision trees).
• Use majority voting (decision tree) or mean (regression) to
predict a result.

More Related Content

Machine Learning - Ensemble Methods

  • 1. Machine Learning Ensemble Methods Portland Data Science Group Created by Andrew Ferlitsch Community Outreach Officer July, 2017
  • 2. Ensemble Methods • An ensemble method is a combination of multiple and diverse models. • Each model in the ensemble makes a prediction. • A final prediction is determined by a majority vote among the models. Model A Model B Model C Input Sample Each Model receives the same input Vote Each Model outputs its Prediction to a vote accumulator ŷ3 ŷ1 ŷ2 ŷf A final predictor is determined from a majority vote of the model’s Predictors.
  • 3. Background - Condorcet • The theory behind Ensemble method is based on a seminal paper written by the French mathematician, Marquis de Condorcet in 1785. • In his paper, he proposed a mathematical reasoning behind majority voting in jury systems on the probability that a jury will come to the correct decision. Essay on the Application of Analysis to the Probability of Majority Decisions https://en.wikipedia.org/wiki/Condorcet%27s_jury_theorem
  • 4. Condorcet’s Jury Theorm Principle: If we assume each voter probability of making a good decision is better than random (i.e., > 0.50), then the probability of a good decision increases with each voter added. He showed the converse was also true. If we assume each voter probability of making a good decision is less than random (i.e., < 0.50), then the probability of a good decision decreases with each voter added. Example Even if the probability is slightly more than random (e.g., 0.51), the principle holds true. p(0.51) + p(0.51) + p(0.51) … = p(> 0.51)
  • 5. Weak Learners • In an Ensemble method, one combines multiple weak learners to make a strong learning model. • A weak learner is any model that has an accuracy of better than random, even if it is just slightly better (e.g., 0.51). Weak Learner 1 Weak Learner 2 Weak Learner N … Majority Vote Strong Learner
  • 6. Ensemble – Decision Stumps Decision Stumps – Weak Learners 1st Feature 2nd Feature < 4 >= 4 3rd Feature weight width < 2.5 >= 2.5 height banana apple banana apple apple <= 4> 4 banana MAJORITY VOTE Weight: 4.2 = Apple Width : 2.3 = Banana Height : 5.5 = Banana VOTE = Banana
  • 7. Bootstrap Aggregation (Bagging) • Bagging is a method of deriving multiple models from the same training data, where each model uses a subset of the training data selected at random. • A prediction is then made based on a majority vote of the models. Training Data Random Subset Random Subset Random Subset Random Subset Random Subsets Random Splitting into Subsets Models Models Models Models Models Trained Weaker Models Majority Vote Models’ Predictions Stronger Predictor
  • 8. Random Forrest • Random Forrest is a popular ensemble method. • Used for Decision Trees (majority vote) or Regression (mean). • Good at solving issues of overfitting in Decision Trees. • Combines Bagging and Splitting of Features. • Split the training data into B random selected subsets. • Split the features into K random selected subsets (e.g., K = sqrt( number of features). • Produce K models, one per feature subset, per data subset, for a total of K*B models (e.g., random decision trees). • Use majority voting (decision tree) or mean (regression) to predict a result.