SlideShare a Scribd company logo
Building and Deploying ML Applications
on production
in a fraction of the time.
A Machine Learning Server in Scala
Available Tools
Processing Framework
• e.g. Apache Spark, Apache Hadoop
Algorithm Libraries
• e.g. MLlib, Mahout
Data Storage
• e.g. HBase, Cassandra
Integrate everything together nicely
and move from prototyping to production.
What is Missing?
You have a mobile app
A Classic Recommender Example…
App
Predict
products
You need a Recommendation Engine
Predict products that a customer will like – and show it.
Predictive
model
Algorithm - You don't need to write your own:

Spark MLlib - ALS algorithm

Predictive model - based on users’ previous behaviors
def pseudocode () {
// Read training data

val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match

{ …. })
// Build a predictive model with an algorithm

val model = ALS.train(trainingData, 10, 20, 0.01)
// Make prediction

allUsers.foreach { user =>

model.recommendProducts(user, 5)

}
}
A Classic Recommender Example
prototyping…
• How to deploy a scalable service that respond to dynamic prediction query?
• How do you persist the predictive model, in a distributed environment?
• How to make HBase, Spark and algorithms talking to each other?
• How should I prepare, or transform, the data for model training?
• How to update the model with new data without downtime?
• Where should I add some business logics?
• How to make the code configurable, re-usable and maintainable?
• How do I build all these with a separate of concerns (SoC)?
Beyond Prototyping
Engine
Event Server
(data storage)
Data: User Actions
Query via REST:
User ID
Predicted Result:
A list of Product IDs
A Classic Recommender Example
on production…
Mobile App
• PredictionIO is a machine learning server for
building and deploying predictive engines

on production

in a fraction of the time.
• Built on Apache Spark, MLlib and HBase.
PredictionIO
Data: User Actions
Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Event Server
(data storage)
Mobile App
Event Server
• $ pio eventserver
• Event-based
client.create_event(
event="rate",
entity_type="user",
entity_id=“user_123”,
target_entity_type="item",
target_entity_id=“item_100”,
properties= { "rating" : 5.0 }
)
Event Server Collecting Date
Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Data: User Actions
Event Server
(data storage)
Mobile App
Engine
• DASE - the “MVC” for Machine Learning
• Data: Data Source and Data Preparator
• Algorithm(s)
• Serving
• Evaluator
Engine Building an Engine with
Separation of Concerns (SoC)
A. Train deployable predictive model(s)
B. Respond to dynamic query
C. Evaluation
Engine Functions of an Engine
Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
def readTraining(sc: SparkContext)
==> trainingData
class Preparator(…) extends PPreparator
def prepare(sc: SparkContext, trainingData: TrainingData)
==> preparedData
class Algorithm1(…) extends PAlgorithm
def train(prepareData: PreparedData)
==> Model
$ pio train
Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
override def readTraining(sc: SparkContext): TrainingData = {
val eventsDb = Storage.getPEvents()
val eventsRDD: RDD[Event] = eventsDb.find(….)(sc)
val ratingsRDD: RDD[Rating] = eventsRDD.map { event =>
val rating = try {
val ratingValue: Double = event.event match {….}
Rating(event.entityId, event.targetEntityId.get, ratingValue)
} catch {…}
rating
}
new TrainingData(ratingsRDD)
}
Engine A. Train predictive model(s)
class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm
def train(preparedData: PreparedData): Model1 = {
mllibRatings = data….
val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda)
new Model1(
rank = m.rank,
userFeatures = m.userFeatures,
productFeatures = m.productFeatures
)
}
Engine A. Train predictive model(s)
Event Server
Algorithm 1 Algorithm 3Algorithm 2
PreparedDate
Engine
Data Preparator
Data Source
TrainingDate
Model 3Model 1Model 2
B. Respond to dynamic queryEngine
• Query (Input) :



$ curl -H "Content-Type: application/json" -d 

'{ "user": "1", "num": 4 }' 

http://localhost:8000/queries.json
case class Query(
val user: String,
val num: Int
) extends Serializable
B. Respond to dynamic queryEngine
• Predicted Result (Output):



{“itemScores”:[{"item":"22","score":4.072304374729956},
{"item":"62","score":4.058482414005789},

{"item":"75","score":4.046063009943821}]}
case class PredictedResult(
val itemScores: Array[ItemScore]
) extends Serializable
case class ItemScore(
item: String,
score: Double
) extends Serializable
class Algorithm1(…) extends PAlgorithm
def predict(model: ALSModel, query: Query)
==> predictedResult
class Serving extends LServing
def serve(query: Query, predictedResults: Seq[PredictedResult])
==> predictedResult
B. Respond to dynamic queryEngine
Query via REST
Engine B. Respond to dynamic query
class Algorithm1(val ap: ALSAlgorithmParams) extends
PAlgorithm
def predict(model: ALSModel, query: Query): PredictedResult = {
model….{ userInt =>
val itemScores = model.recommendProducts (…).map (….)
new PredictedResult(itemScores)
}.getOrElse{….}
}
B. Respond to dynamic queryEngine
Algorithm 1
Model 1
Serving
Mobile App
Algorithm 3
Model 3
Algorithm 2
Model 2
Predicted Results
Query (input)
Predicted Result (output)
Engine
Engine DASE Factory
object RecEngine extends IEngineFactory {
def apply() = {
new Engine(
classOf[DataSource],
classOf[Preparator],
Map("algo1" -> classOf[Algorithm1]),
classOf[Serving])
}
}
Running on Production
• Install PredictionIO

$ bash -c "$(curl -s http://install.prediction.io/install.sh)"
• Start the Event Server

$ pio eventserver
• Deploy an Engine

$ pio build; pio train; pio deploy
• Update Engine Model with New Data

$ pio train; pio deploy
Deploy on Production
Website
Mobile App
Email
Campaign
Event Server
(data storage)
Data
Query via REST
Predicted
Result
Engine 1
Engine 3
Engine 2
Engine 4
The Next Step
• Quickstart with an Engine Template!
• Follow on Github: github.com/predictionio/
• Learn PredictionIO: prediction.io/
• Learn Scala! Scala for the Impatient
• Contribute!
Thanks.
Simon Chan
simon@prediction.io
@PredictionIO
prediction.io (Newsletters)
github.com/predictionio

More Related Content

PredictionIO – A Machine Learning Server in Scala – SF Scala

  • 1. Building and Deploying ML Applications on production in a fraction of the time. A Machine Learning Server in Scala
  • 2. Available Tools Processing Framework • e.g. Apache Spark, Apache Hadoop Algorithm Libraries • e.g. MLlib, Mahout Data Storage • e.g. HBase, Cassandra
  • 3. Integrate everything together nicely and move from prototyping to production. What is Missing?
  • 4. You have a mobile app A Classic Recommender Example… App Predict products You need a Recommendation Engine Predict products that a customer will like – and show it. Predictive model Algorithm - You don't need to write your own:
 Spark MLlib - ALS algorithm
 Predictive model - based on users’ previous behaviors
  • 5. def pseudocode () { // Read training data
 val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match
 { …. }) // Build a predictive model with an algorithm
 val model = ALS.train(trainingData, 10, 20, 0.01) // Make prediction
 allUsers.foreach { user =>
 model.recommendProducts(user, 5)
 } } A Classic Recommender Example prototyping…
  • 6. • How to deploy a scalable service that respond to dynamic prediction query? • How do you persist the predictive model, in a distributed environment? • How to make HBase, Spark and algorithms talking to each other? • How should I prepare, or transform, the data for model training? • How to update the model with new data without downtime? • Where should I add some business logics? • How to make the code configurable, re-usable and maintainable? • How do I build all these with a separate of concerns (SoC)? Beyond Prototyping
  • 7. Engine Event Server (data storage) Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs A Classic Recommender Example on production… Mobile App
  • 8. • PredictionIO is a machine learning server for building and deploying predictive engines
 on production
 in a fraction of the time. • Built on Apache Spark, MLlib and HBase. PredictionIO
  • 9. Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs Engine Event Server (data storage) Mobile App Event Server
  • 10. • $ pio eventserver • Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } ) Event Server Collecting Date
  • 11. Query via REST: User ID Predicted Result: A list of Product IDs Engine Data: User Actions Event Server (data storage) Mobile App Engine
  • 12. • DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator Engine Building an Engine with Separation of Concerns (SoC)
  • 13. A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation Engine Functions of an Engine
  • 14. Engine A. Train predictive model(s) class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model $ pio train
  • 15. Engine A. Train predictive model(s) class DataSource(…) extends PDataSource override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc) val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }
  • 16. Engine A. Train predictive model(s) class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }
  • 17. Engine A. Train predictive model(s) Event Server Algorithm 1 Algorithm 3Algorithm 2 PreparedDate Engine Data Preparator Data Source TrainingDate Model 3Model 1Model 2
  • 18. B. Respond to dynamic queryEngine • Query (Input) :
 
 $ curl -H "Content-Type: application/json" -d 
 '{ "user": "1", "num": 4 }' 
 http://localhost:8000/queries.json case class Query( val user: String, val num: Int ) extends Serializable
  • 19. B. Respond to dynamic queryEngine • Predicted Result (Output):
 
 {“itemScores”:[{"item":"22","score":4.072304374729956}, {"item":"62","score":4.058482414005789},
 {"item":"75","score":4.046063009943821}]} case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable case class ItemScore( item: String, score: Double ) extends Serializable
  • 20. class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult B. Respond to dynamic queryEngine Query via REST
  • 21. Engine B. Respond to dynamic query class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }
  • 22. B. Respond to dynamic queryEngine Algorithm 1 Model 1 Serving Mobile App Algorithm 3 Model 3 Algorithm 2 Model 2 Predicted Results Query (input) Predicted Result (output) Engine
  • 23. Engine DASE Factory object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }
  • 24. Running on Production • Install PredictionIO
 $ bash -c "$(curl -s http://install.prediction.io/install.sh)" • Start the Event Server
 $ pio eventserver • Deploy an Engine
 $ pio build; pio train; pio deploy • Update Engine Model with New Data
 $ pio train; pio deploy
  • 25. Deploy on Production Website Mobile App Email Campaign Event Server (data storage) Data Query via REST Predicted Result Engine 1 Engine 3 Engine 2 Engine 4
  • 26. The Next Step • Quickstart with an Engine Template! • Follow on Github: github.com/predictionio/ • Learn PredictionIO: prediction.io/ • Learn Scala! Scala for the Impatient • Contribute!