H2O World - Building a Smarter Application - Tom Kraljevic

BUILDING
A
SMARTER
APPLICATION

Tom
Kraljevic

tomk@h2o.ai

H2O
World
2015

Nov.
9,
2015

(+
help
from
Amy
and
Prithvi)

BUILDING
A
SMARTER
APPLICATION

Q:

What
is
a
Smarter
ApplicaKon?

A:

An
applicaKon
that
learns
from
data

[From
rules-‐based
to
model-‐based]

Topics:

o Combining
applicaKons
with
models

o Deploying
models
into
producKon

Target
audience:

o  Developers
adding
Machine
Learning
to
apps

o  Data
Scien0sts/DevOps
puYng
models
into
producKon

RESOURCES
ON
THE
WEB

•  Slides
and
material
for
this
talk

o GitHub
h2oai/h2o-‐world-‐2015-‐training

•  tutorials/building-‐a-‐smarter-‐applicaKon

•  Source
code

o GitHub
h2oai/app-‐consumer-‐loan

•  H2O
Tibshirani
release
(on
USB)

o h^p://h2o.ai/download

•  Generated
POJO
model
Javadoc

o h^p://h2o-‐release.s3.amazonaws.com/h2o/rel-‐
Kbshirani/3/docs-‐website/h2o-‐genmodel/javadoc/
index.html

A
CONCRETE
USE
CASE

•  We’re
building
a
consumer
loan
app

•  The
end-‐user
is
applying
for
a
loan

•  Imagine
the
website
is
a
lender

•  Should
a
loan
be
oﬀered?

•  Two
predicKve
models

o Is
the
loan
predicted
to
be
bad
(yes/no)

o If
no,
what
is
the
interest
rate
to
be
oﬀered?

STEPS
TO
BUILDING
A
SMARTER
APP

Step
1:

Picking
the
quesKon
your
model
will
answer

Step
2:

Using
your
data
to
build
a
model

Step
3:

ExporKng
the
generated
model
as
a
Java
POJO

Step
4:

Compiling
the
model

Step
5:

HosKng
the
model
in
a
servlet
container

Step
6:

Running
the
JavaScript
app
in
a
browser

Step
7:

Using
a
REST
API
to
make
predicKons

Step
8:

IncorporaKng
the
predicKon
into
your
applicaKon

THE
DATA

•  Lending
club
loans
from
2007
to
June
2015

•  Only
loans
that
have
a
known
good
or
bad

outcome
are
used
to
build
the
model

•  163,987
rows

•  15
columns

DATA
DICTIONARY

Predictor
Variable
DescripBon
Units

loan_amnt
Requested
loan
amount
US
dollars

term
Loan
term
length
months

emp_length
Employment
length
years

home_ownership
Housing
status
categorical

annual_inc
Annual
income
US
dollars

verificaKon_status
Income
verificaKon
status
categorical

purpose
Purpose
for
the
loan
categorical

addr_state
State
of
residence
categorical

dK
Debt
to
income
raKo
%

delinq_2yrs
Number
of
delinquencies
in
the
past
2
years
integer

revol_uKl
Revolving
credit
line
uKlized
%

total_acc
Total
accounts
(number
of
credit
lines)
integer

longest_credit_length
Age
of
oldest
acKve
account
years

Response
Variable
DescripBon
Model
Category

bad_loan
Is
this
loan
likely
to
be
bad?

Binomial
classificaKon

int_rate
What
should
the
interest
rate
be?

Regression

WORKFLOW
FOR
THIS
APP

Edit
form

ﬁeld
value

Get

PredicKons

Bad

Loan?

Visit
web

page

Declined

Accepted:
Show

Interest
Rate

Start

Yes No
New data point
(e.g. loan amount, income)
Is the loan predicted to be bad?
What is the interest rate?

APP
ARCHITECTURE
DIAGRAM

Back-end
/

(webapp)

html,
css,
js

/predict

(servlet)

java

.war file
Jetty servlet container
Front-end
Web browser
Javascript

applicaKon

Port
8080
1. HTTP GET with query parameters (loan_amt, annual_inc, etc.)
1
2
2. JSON response with predictions

MODEL
INFORMATION

Bad
Loan
Model

Algorithm:

GBM

Model
category:
Binary

ClassiﬁcaKon

ntrees:

100

max_depth:

5

learn_rate:

0.05

AUC
on
valid:
.685

max
F1:

0.202

Interest
Rate
Model

Algorithm:

GBM

Model
category:
Regression

ntrees:

100

max_depth:

5

learn_rate:

0.05

MSE:

11.1

R2:

0.424

SOFTWARE
PIECES

•  Oﬄine

o  Gradle
(build)

o  R
+
H2O
(model
building)

•  Front-‐end

o  Web
browser

o  JavaScript
applicaKon
(run
in
the
browser)

•  Back-‐end

o  Je^y
servlet
container

o  H2O-‐generated
model
POJO
(hosted
by
servlet
container)

WHAT
YOU
NEED
FOR
HANDS-‐ON

•  R
and
the
H2O
R
package
(from
USB)

•  gradle
(from
USB)

•  app-‐consumer-‐loan
(from
USB)

o Code
and
data
is
self-‐contained
in
this
directory

I have R
installed
I have H2O
Installed

HANDS-‐ON
DEMONSTRATION

If
you
are
already
running
H2O
on
your
laptop,
please
stop
it
so
the
gradle

script
runs
properly!

STEP
1:

Compile
and
run
(From
the
command
line)

C:>
cd
pathtoH2OWorld2015USBimage

C:>
cd
app-‐consumer-‐loan

C:>
..gradle-‐2.8bingradle
je^yRunWar

STEP
2:

Use
the
app
(In
a
web
browser)

h^p://localhost:8080

(Future)
STEP
3:

Rerun
without
rebuilding
the
models
or
recompiling

C:>
..gradle-‐2.8bingradle
je^yRunWar
-‐x
war

COMMON
HANDS-‐ON
ERRORS

•  Common
R
errors

o R
not
on
PATH

•  Gradle
needs
to
invoke
R

o Another
H2O
is
already
running

•  the
R
script
can’t
ﬁnd
the
data
in
h2o.importFile()

•  Common
Java
errors

o Java
not
installed
at
all

•  Also,
must
install
a
JDK
(Java
Development
Kit)
so
that
the

Java
compiler
is
available
(JRE
is
not
suﬃcient)

o Not
connected
to
the
internet

•  Gradle
needs
to
fetch
some
dependencies
from
the
internet

KEY
FILES

•  Oﬄine

o  build.gradle

o  data/loan.csv

o  script.R

•  Front-‐end

o  src/main/webapp/index.html

o  src/main/webapp/app.js

•  Back-‐end

o  src/main/java/org/gradle/PredictServlet.java

o  lib/h2o-‐genmodel.jar
(downloaded)

o  src/main/java/org/gradle/BadLoanModel.java
(generated)

o  src/main/java/org/gradle/InterestRateModel.java
(generated)

POST-‐DEMO
POINTERS

•  POJO
Javadoc

o h^p://h2o-‐release.s3.amazonaws.com/h2o/rel-‐
Kbshirani/3/docs-‐website/h2o-‐genmodel/
javadoc/index.html

NEXT
STEPS:
CLOSING
THE
FEEDBACK
LOOP

•  Scoring

o Judging
how
good
the
predicKons
really
are

o Need
to
get
the
correct
answers
from
somewhere

•  Storing
predicKons
(and
the
correct
answers)

o Ouen
Hadoop

o This
can
be
a
lot
of
work
to
organize

NEXT
STEPS:
RETRAINING
AND
DEPLOYING

•  Model
update
frequency

o Need
depends
on
the
use
case

o Hourly,
daily,
monthly?

o Time
cost
of
training
the
model
is
a
factor

•  Hot
swapping
the
model

o SeparaKng
front-‐end
and
back-‐end
makes
this
easier

o Java
reﬂecKon
for
in-‐process
hot-‐swap

o Load
balancer
for
servlet
container
hot-‐swap

RELATED
EXAMPLES

•  H2O
Generated
Model
POJO
in
a
Storm
bolt

o GitHub:

h2oai/h2o-‐world-‐2015-‐training

o tutorials/streaming/storm

•  H2O
Generated
Model
POJO
in
Spark
Streaming

o GitHub:
h2oai/sparkling-‐water

o examples/src/main/scala/org/apache/spark/
examples/h2o/CraigslistJobTitlesStreamingApp.scala

BONUS
APP

•  “Ask
Craig”
app

•  h^p://classify.h2o.ai

•  Uses
Spark
word2vec,
H2O
gbm

•  MulKnomial
classiﬁcaKon
problem

o Given
the
words
for
a
new
job
posKng,
ﬁgure
out
the

right
category
for
the
job

•  See
H2O.ai
blog
about
this
Sparkling
Water
app

o Do
a
web
search
for
“h2o
ask
craig”

•  GitHub:
h2oai/app-‐ask-‐craig

Q
&
A

•  Thanks
for
a^ending!

•  Send
follow
up
quesKons
to:

Tom
Kraljevic

tomk@h2o.ai

H2O World - Building a Smarter Application - Tom Kraljevic

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to H2O World - Building a Smarter Application - Tom Kraljevic

Similar to H2O World - Building a Smarter Application - Tom Kraljevic (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

H2O World - Building a Smarter Application - Tom Kraljevic