PredictionIO – A Machine Learning Server in Scala – SF Scala
- 1. Building and Deploying ML Applications
on production
in a fraction of the time.
A Machine Learning Server in Scala
- 4. You have a mobile app
A Classic Recommender Example…
App
Predict
products
You need a Recommendation Engine
Predict products that a customer will like – and show it.
Predictive
model
Algorithm - You don't need to write your own:
Spark MLlib - ALS algorithm
Predictive model - based on users’ previous behaviors
- 5. def pseudocode () {
// Read training data
val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match
{ …. })
// Build a predictive model with an algorithm
val model = ALS.train(trainingData, 10, 20, 0.01)
// Make prediction
allUsers.foreach { user =>
model.recommendProducts(user, 5)
}
}
A Classic Recommender Example
prototyping…
- 6. • How to deploy a scalable service that respond to dynamic prediction query?
• How do you persist the predictive model, in a distributed environment?
• How to make HBase, Spark and algorithms talking to each other?
• How should I prepare, or transform, the data for model training?
• How to update the model with new data without downtime?
• Where should I add some business logics?
• How to make the code configurable, re-usable and maintainable?
• How do I build all these with a separate of concerns (SoC)?
Beyond Prototyping
- 8. • PredictionIO is a machine learning server for
building and deploying predictive engines
on production
in a fraction of the time.
• Built on Apache Spark, MLlib and HBase.
PredictionIO
- 9. Data: User Actions
Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Event Server
(data storage)
Mobile App
Event Server
- 10. • $ pio eventserver
• Event-based
client.create_event(
event="rate",
entity_type="user",
entity_id=“user_123”,
target_entity_type="item",
target_entity_id=“item_100”,
properties= { "rating" : 5.0 }
)
Event Server Collecting Date
- 11. Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Data: User Actions
Event Server
(data storage)
Mobile App
Engine
- 12. • DASE - the “MVC” for Machine Learning
• Data: Data Source and Data Preparator
• Algorithm(s)
• Serving
• Evaluator
Engine Building an Engine with
Separation of Concerns (SoC)
- 13. A. Train deployable predictive model(s)
B. Respond to dynamic query
C. Evaluation
Engine Functions of an Engine
- 14. Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
def readTraining(sc: SparkContext)
==> trainingData
class Preparator(…) extends PPreparator
def prepare(sc: SparkContext, trainingData: TrainingData)
==> preparedData
class Algorithm1(…) extends PAlgorithm
def train(prepareData: PreparedData)
==> Model
$ pio train
- 15. Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
override def readTraining(sc: SparkContext): TrainingData = {
val eventsDb = Storage.getPEvents()
val eventsRDD: RDD[Event] = eventsDb.find(….)(sc)
val ratingsRDD: RDD[Rating] = eventsRDD.map { event =>
val rating = try {
val ratingValue: Double = event.event match {….}
Rating(event.entityId, event.targetEntityId.get, ratingValue)
} catch {…}
rating
}
new TrainingData(ratingsRDD)
}
- 16. Engine A. Train predictive model(s)
class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm
def train(preparedData: PreparedData): Model1 = {
mllibRatings = data….
val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda)
new Model1(
rank = m.rank,
userFeatures = m.userFeatures,
productFeatures = m.productFeatures
)
}
- 17. Engine A. Train predictive model(s)
Event Server
Algorithm 1 Algorithm 3Algorithm 2
PreparedDate
Engine
Data Preparator
Data Source
TrainingDate
Model 3Model 1Model 2
- 18. B. Respond to dynamic queryEngine
• Query (Input) :
$ curl -H "Content-Type: application/json" -d
'{ "user": "1", "num": 4 }'
http://localhost:8000/queries.json
case class Query(
val user: String,
val num: Int
) extends Serializable
- 19. B. Respond to dynamic queryEngine
• Predicted Result (Output):
{“itemScores”:[{"item":"22","score":4.072304374729956},
{"item":"62","score":4.058482414005789},
{"item":"75","score":4.046063009943821}]}
case class PredictedResult(
val itemScores: Array[ItemScore]
) extends Serializable
case class ItemScore(
item: String,
score: Double
) extends Serializable
- 20. class Algorithm1(…) extends PAlgorithm
def predict(model: ALSModel, query: Query)
==> predictedResult
class Serving extends LServing
def serve(query: Query, predictedResults: Seq[PredictedResult])
==> predictedResult
B. Respond to dynamic queryEngine
Query via REST
- 21. Engine B. Respond to dynamic query
class Algorithm1(val ap: ALSAlgorithmParams) extends
PAlgorithm
def predict(model: ALSModel, query: Query): PredictedResult = {
model….{ userInt =>
val itemScores = model.recommendProducts (…).map (….)
new PredictedResult(itemScores)
}.getOrElse{….}
}
- 22. B. Respond to dynamic queryEngine
Algorithm 1
Model 1
Serving
Mobile App
Algorithm 3
Model 3
Algorithm 2
Model 2
Predicted Results
Query (input)
Predicted Result (output)
Engine
- 23. Engine DASE Factory
object RecEngine extends IEngineFactory {
def apply() = {
new Engine(
classOf[DataSource],
classOf[Preparator],
Map("algo1" -> classOf[Algorithm1]),
classOf[Serving])
}
}
- 24. Running on Production
• Install PredictionIO
$ bash -c "$(curl -s http://install.prediction.io/install.sh)"
• Start the Event Server
$ pio eventserver
• Deploy an Engine
$ pio build; pio train; pio deploy
• Update Engine Model with New Data
$ pio train; pio deploy
- 26. The Next Step
• Quickstart with an Engine Template!
• Follow on Github: github.com/predictionio/
• Learn PredictionIO: prediction.io/
• Learn Scala! Scala for the Impatient
• Contribute!