SlideShare a Scribd company logo
Ben Taylor @bentaylordata
Predictive Analytics / Data Science
Presentation Objectives
• Enable you to be smarter than your prospect (data history / lingo)
• Motivate you to be unstoppable and hyper-confident
• Motivate you to begin looking for data driven opportunities
• Motivate you to become a data scientist
"What the hell is cloud computing?"
-Larry Ellison, CEO Oracle
What is cloud computing?
?
What is big data?
 Big data includes datasets or problems which exceed the
capacity of a single computer and require a distributed data
access system.
 The concept of "big" is relative to the conventional systems
and technology and is subject to change in the future with
advances in memory and storage solutions.
http://www.pcmag.com/article2/0,2817,2453838,00.asp
Big data trends
What is a data scientist?
What is a data scientist?
Engineering Finance Economics Mathematics Computer Science Physics
Data Science
6-10yrs
Python Bootcamp $8,000 (3 months)
$16,000-$4,000 (3 months)
$115K avg
What is a data scientist?
What is a data scientist?
Master Builder
What is a data scientist?
Reality distortion: Hyper-confidence
Predictive analytics and big data tutorial
Data Scientist = Peacock
@bentaylordata
Humans Algorithms
VS
Smartest pirate
0 20 40 60 80 100 120 140 160 180 200
0
200
400
600
800
1000
1200
1400
Ships captured
Treasurechests
Humans Algorithms
VS
0 20 40 60 80 100 120 140 160 180 200
Ships captured
NA
Humans Algorithms
VS
0 20 40 60 80 100 120 140 160 180 200
Ships captured
German (1795), French (1806)
0 20 40 60 80 100 120 140 160 180 2
0
200
400
600
800
1000
1200
1400
Ships captured
Treasurechests
Humans Algorithms
VS
1997, IBM deep blue
Kasparov
Humans Algorithms
VS
2011, IBM Watson
Ken Jennings & Brad Rutter
Humans Algorithms
VS
2014, HireVue Iris
Hiring Panel
Prediction process
Raw data
Data munging
Training
Model
Data
munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
Clean data
Numeric Excel example
@bentaylordata
Data
munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,
NAÏVE BAYESIAN, NEURAL NET
Missing values + categorical
@bentaylordata
Predictive analytics and big data tutorial
Data
munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,
NAÏVE BAYESIAN, NEURAL NET
Retail > 15, Engineering > 95
> 5.67
Resume model
Resume model
Data
munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,
NAÏVE BAYESIAN, NEURAL NET
Retail > 15, Engineering > 95
GPA, Colleges, Hobbies
> 5.67
Text deeper dive
Sentiment example
Sentiment example
Sentiment
Given data, find cat? dog?
@bentaylordata
Talk like a data nerd
@bentaylordata
Confidence & Over-fitting
Confidence & Over-fitting
Data Lingo
 Supervised vs unsupervised learning
 Supervised: Training set provided.
 Unsupervised: No training set, clustering based on
similar attributes.
Data Lingo
 Analytic Layers
 Descriptive Analytics: Telling a data story, plotting, or
visualization.
 Predictive Analytics: Predict future outcomes, usually
trained on a historical training set
 Prescriptive Analytics: Using the insight from your
predictive model to proactively change something
 Interview/Interaction Analytics: Any analytics
surrounding the interview or interaction.
Data Lingo
 Prediction methods
 Regression: Predicting a continuous output (stock)
 Classification: Predicting discrete category outputs.
i.e. Yes/Maybe/No
Data Lingo
 Data Types
 Structured: Does it play well in Excel?
 Unstructured: Raw text (Twitter), audio, video,
photos, resumes, etc…

More Related Content

Predictive analytics and big data tutorial