SlideShare a Scribd company logo
A Practical Guidance to the Enterprise
Machine Learning Platform Ecosystem
About Us
• Helping great companies become great software companies
• Building software solutions powered by disruptive enterprise software trends
-Machine learning and data science
-Cyber-security
-Enterprise IOT
-Powered by Cloud and Mobile
• Bringing innovation from startups and academic institutions to the enterprise
• Award winning agencies: Inc 500, American Business Awards, International Business Awards
About This Webinar
• Research that brings together big enterprise software trends,
exciting startups and academic research
• Best practices based on real world implementation experience
• No sales pitches
• Cloud vs. On-Premise machine learning
• Cloud machine learning platforms
• Azure machine learning
• AWS machine learning
• Databricks
• Watson developer cloud
• Others…
• On-premise machine learning platforms
• Revolution analytics
• Dato
• Spark Mlib
• TensorFlow
• Others…
Agenda
Enterprise Data Science
“data science”
A practical guidance of the enterprise machine learning
Modern Machine Learning
• Advances in storage, compute and data science research are
making machine learning as part of mainstream technology
platforms
• Big data movement
• Machine learning platforms are optimized with developer-friendly
interfaces
• Platform as a service providers have drastically lowered the
entry point for machine learning applications
• R and Python are leading the charge
Cloud vs. On-Premise
machine learning platforms
Cloud Machine Learning Platforms: Benefits
• Service abstraction layer over the machine learning infrastructure
• Rich visual modeling tools
• Rich monitoring and tracking interfaces
• Combine multiple platforms: R, Python, etc
• Enable programmatic access to ML models
Cloud machine Learning Platforms:: Challenges
• Integration with on-premise data stores
• Extensibility
• Security and privacy
On-Premise machine Learning Platforms: Benefits
• Control
• Security
• Integration with on-premise data stores
• Integrated with R and Python machine learning frameworks
On-Premise machine Learning Platforms: Challenges
• Code-based modeling interfaces
• Scalability
• Tightly coupled with Hadoop distributions
• Monitoring and management
• Data quality and curation
Cloud Machine Learning Platforms
• Azure Machine Learning
• AWS machine learning
• Databricks
• Watson developer cloud
The Leaders
Azure Machine Learning
Azure Machine Learning
• Native machine learning capabilities as part of the Azure cloud
• Elastic infrastructure that scale based on the model requirements
• Support over 30 supervised and unsupervised machine learning
algorithms
• Integration with R and Python machine learning libraries
• Expose machine learning models via programmable interfaces
• Integrated with the Cortana Analytics suite
• Integrated with PowerBI
• Supports both supervised and
unsupervised models
• Integrated with Azure HDInsight
• Large library of models and sample
gallery
• Support for R and Python code
Visual Model Creation
• Visual dashboard to track the
execution of ML models
• Track execution of different steps
within a ML model
• Integrated monitoring experience
with other Azure services
Rich Monitoring and Management Interface
• Expose machine learning models as
Web Services APIs
• Integrate ML Models with Azure API
Gateway
• Retrain and extend models via ML
APIs
Programmatic Access to ML Models
AWS Machine Learning
AWS Machine Learning
• Native machine learning service in AWS
• Provide data exploration and visualization tools
• Supports supervised and unsupervised algorithms
• Integrated data transformation models
• APIs for dynamically creating machine learning models
• Programmatic creation of machine
learning models
• Large number of algorithms and recipes
• Data transformation models included in
the language
Sophisticated ML Model Authoring
• Sophisticated monitoring for
evaluating ML models
• Integrated with AWS Cloud Watch
• KPIs that evaluate the efficiency of
ML models
Monitoring ML Model Execution
• Optimized DSL for data
transformation
• Recipes that abstract common
transformations
• Reuse transformation recipes
across ML models
Embedded Data Transformation
• Sophisticated monitoring for
evaluating ML models
• Integrated with AWS Cloud Watch
• KPIs that evaluate the efficiency of
ML models
Monitoring ML Model Execution
Databricks
Databricks Machine Learning
• Scaling Spark machine learning pipelines
• Integrated data visualization tools
• Sophisticated ML monitoring tools
• Combine Python, Scala and R in a single platform
• Implementing machine learning
models using Notebooks
• Publishing notebooks to a
centralized catalog
• Leverage Python, Scala or R to
implement machine learning models
Notebooks Based Authoring
• Integrate data visualization into
machine learning pipelines
• Reuse data visualization
notebooks across applications
• Evaluate the efficiency of
machine learning pipelines using
visualizations
Machine Learning Data Visualization
• Monitor the execution of machine
learning pipelines
• Run machine learning pipelines
manually
• Rapidly modify and deploy machine
learning pipelines
Monitoring and Management
Watson Developer Cloud
• Personality Insights
• Tradeoff Analytics
• Relationship Extraction
• Concept Insights
• Speech to Text
• Text to Speech
• Visual Recognition
• Natural Language Classifier
• Language Identification
• Language Translation
• Question and Answer
• Concept Expansion
• Message Resonance
• AlchemyAPI Services
Large Variety of Cognitive Services
• Access services via REST APIs
• SDKs available for different
languages
• Integration with different
services in the BlueMix
platform
Rich Developer Interfaces
Relationship Extraction Concept Expansion Message Resonance
User Modeling
Complex Algorithms – Simple Interfaces
Other Interesting Platforms
• Microsoft’s Project Oxford https://www.projectoxford.ai/
• BigML https://bigml.com/
On-premise machine
learning platforms
The Leaders
• Revolution Analytics (Microsoft)
• Spark Mlib + Spark R
• Dato
• TensorFlow
• Others: PredictionIO, Scikit-learn…
Revolution Analytics
All of Open Source R plus:
• Big Data scalability
• High-performance analytics
• Development and deployment tools
• Data source connectivity
• Application integration framework
• Multi-platform architecture
• Support, Training and Services
Revolution Analytics (Microsoft)
DistributedR
ScaleR
ConnectR
DeployR
In the Cloud Amazon AWS
Workstations & Servers Windows
Red Hat and SUSE Linux
Clustered Systems IBM Platform LSF
Microsoft HPC
EDW IBM Netezza
Teradata
Hadoop Hortonworks
Cloudera
Write Once, Deploy Anywhere
DeployR does not provide any application UI.
3 integration modes embed real-time R results
into existing interfaces
Web app, mobile app, desktop app, BI tool,
Excel, …
RBroker Framework :
Simple, high-performance API for Java, .NET
and Javascript apps Supports transactional,
on-demand analytics on a stateless R session
Client Libraries:
Flexible control of R services from Java,
.NET and Javascript apps Also supports
stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages
Integrate R Scripts Into Third Party Applications
Spark Mlib + SparkR
• It is built on Apache Spark, a fast and
general engine for large-scale data
processing
• Run programs up to 100x faster than Hadoop
MapReduce in memory, or 10x faster on disk.
• Write applications quickly in Java, Scala,
or Python.
Spark Mlib
• Integrated with Spark SQL for data
queries and transformations
• Integrated with Spark GraphX for
data visualizations
• Integrated with Spark Streaming for
real time data processing
Beyond Machine Learning
• Run R and machine learning models
using the same infrastructure
• Leverage R scripts from Spark Mlib
models
• Scale R models as part of a Spark
cluster
• Execute R models programmatically
using Java APIs
Spark Mlib + SparkR
Dato
• Makes Python machine learning
enterprise – ready
• Graphlab Create
• Dato Distributed
• Dato Predictive Services
Dato
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
Principles:
• Get started fast
• Rapidly iterate
• Combine for new apps
import graphlab as gl
data = gl.SFrame.read_csv('my_data.csv')
model = gl.recommender.create(data,
user_id='user',
item_id='moviez
target='rating')
recommendations = model.recommend(k=5)
Recommender Image search Sentiment Analysis
Data Matching Auto Tagging Churn Predictor
Click Prediction Product Sentiment Object Detector
Search Ranking Summarization …
Sophisticated ML made easy - Toolkits
Tensor Flow
• Powers deep learning capabilities on dozens
of Google’s products
• Interfaces for modeling machine and deep
learning algorithms
• Platform for executing those algorithms
• Scales from mobile devices to a cluster with
thousands of nodes
• Has become one of the most popular projects
in Guthub in less than a week
Google’s Tensor Flow
• Based on the principle of a dataflow
graph
• Nodes can perform data operations
but also send or receive data
• Python and C++ libraries. NodeJS, Go
and others in the pipeline
Tensorflow Programming Model
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print "step %d, training accuracy %g"%(i, train_accuracy)
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print "test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
• Scales from a single device to a large
cluster of nodes
• Tensorflow uses a placement algorithm
based on heuristics to place tasks on
the different nodes in a graph
• The execution engine assigns tasks for
fault tolerance
• Linear scalability model
Tensor Flow Implementation
• TensorFlow includes an engine that
enables the visual representation of
the execution graph
• Visualizations include summary
statistics of the different states of
the model
• The visualization engine is included
in the current open source release
Tensor Flow Graph Visualization
Other Interesting Projects
• H20.ai
• PredictionIO
• Scikit-Learn
• Microsoft’s DMTK
Machine Learning in the Enterprise
•Enable foundational building blocks
-Data quality
-Data discovery
-Functional and integration testing
•Predictions are tempting but classification and clustering are
easier
•Run multiple models at once
•Enable programmatic interfaces to interact with ML models
•Start small, deliver quickly, iterate…
Machine Learning in the Enterprise
•Machine learning is becoming one of the most important elements of
modern enterprise solutions
•Innovation in machine learning is happening in both the on-premise
and cloud space
•Cloud machine learning innovators include: Azure ML, AWS ML,
Databricks and IBM Watson
•On-premise machine learning innovators include: Spark Mlib,
Microsoft’s Revolution R, Dato, TensorFlow
•Enterprise machine learning solutions should include elements such
as data quality, data governance, etc
•Start small and use real use cases
Summary
Thanks
jesus.rodriguez@tellago.com
https://twitter.com/jrdothoughts
http://jrodthoughts.com/
https://medium.com/@jrodthoughts
Appendix A: Scikit-Learn
• Extensions to SciPy (Scientific Python) are called SciKits. SciKit-Learn
provides machine learning algorithms.
• Algorithms for supervised & unsupervised learning
• Built on SciPy and Numpy
• Standard Python API interface
• Sits on top of c libraries, LAPACK, LibSVM, and Cython
• Open Source: BSD License (part of Linux)
• Probably the best general ML framework out there.
Scikit-Learn
Load &
Transform Data
Raw Data
Feature
Extraction
Build Model
Feature
Evaluation
Very Simple Prediction Model
Evaluate
Model
Assess how model will generalize to independent data set (e.g.
data not in the training set).
1. Divide data into training and test splits
2. Fit model on training, predict on test
3. Determine accuracy, precision and recall
4. Repeat k times with different splits then average as F1
Predicted Class A Predicted Class B
Actual A True A False B #A
Actual B False A True B #B
#P(A) #P(B) total
Simple Programming Model-Cross Validation (classification)
How to evaluate clusters? Visualization (but only in 2D)
Data Visualization
Appendix B: Prediction IO
• Developer friendly machine learning platform
• Completely open source
• Based on Apache Spark
PredictionIO
• PredictionIO platform
A machine learning stack for building, evaluating
and deploying engines with machine learning
algorithms.
• Event Server
An open source machine learning analytics layer for
unifying events from multiple platforms
• Template Gallery
engine templates for different type of machine
learning applications
A Simple Architecture
• Execute models asynchronous via event
interface
• Query data programmatically via REST
interface
• Various SDKs provided as part of the platform
Model Execution
• Visual model for model creation
• Integrated with a template gallery
• Ability to test and valite engines
Rich Model Creation Interface

More Related Content

A practical guidance of the enterprise machine learning

  • 1. A Practical Guidance to the Enterprise Machine Learning Platform Ecosystem
  • 2. About Us • Helping great companies become great software companies • Building software solutions powered by disruptive enterprise software trends -Machine learning and data science -Cyber-security -Enterprise IOT -Powered by Cloud and Mobile • Bringing innovation from startups and academic institutions to the enterprise • Award winning agencies: Inc 500, American Business Awards, International Business Awards
  • 3. About This Webinar • Research that brings together big enterprise software trends, exciting startups and academic research • Best practices based on real world implementation experience • No sales pitches
  • 4. • Cloud vs. On-Premise machine learning • Cloud machine learning platforms • Azure machine learning • AWS machine learning • Databricks • Watson developer cloud • Others… • On-premise machine learning platforms • Revolution analytics • Dato • Spark Mlib • TensorFlow • Others… Agenda
  • 8. Modern Machine Learning • Advances in storage, compute and data science research are making machine learning as part of mainstream technology platforms • Big data movement • Machine learning platforms are optimized with developer-friendly interfaces • Platform as a service providers have drastically lowered the entry point for machine learning applications • R and Python are leading the charge
  • 9. Cloud vs. On-Premise machine learning platforms
  • 10. Cloud Machine Learning Platforms: Benefits • Service abstraction layer over the machine learning infrastructure • Rich visual modeling tools • Rich monitoring and tracking interfaces • Combine multiple platforms: R, Python, etc • Enable programmatic access to ML models
  • 11. Cloud machine Learning Platforms:: Challenges • Integration with on-premise data stores • Extensibility • Security and privacy
  • 12. On-Premise machine Learning Platforms: Benefits • Control • Security • Integration with on-premise data stores • Integrated with R and Python machine learning frameworks
  • 13. On-Premise machine Learning Platforms: Challenges • Code-based modeling interfaces • Scalability • Tightly coupled with Hadoop distributions • Monitoring and management • Data quality and curation
  • 15. • Azure Machine Learning • AWS machine learning • Databricks • Watson developer cloud The Leaders
  • 17. Azure Machine Learning • Native machine learning capabilities as part of the Azure cloud • Elastic infrastructure that scale based on the model requirements • Support over 30 supervised and unsupervised machine learning algorithms • Integration with R and Python machine learning libraries • Expose machine learning models via programmable interfaces • Integrated with the Cortana Analytics suite • Integrated with PowerBI
  • 18. • Supports both supervised and unsupervised models • Integrated with Azure HDInsight • Large library of models and sample gallery • Support for R and Python code Visual Model Creation
  • 19. • Visual dashboard to track the execution of ML models • Track execution of different steps within a ML model • Integrated monitoring experience with other Azure services Rich Monitoring and Management Interface
  • 20. • Expose machine learning models as Web Services APIs • Integrate ML Models with Azure API Gateway • Retrain and extend models via ML APIs Programmatic Access to ML Models
  • 22. AWS Machine Learning • Native machine learning service in AWS • Provide data exploration and visualization tools • Supports supervised and unsupervised algorithms • Integrated data transformation models • APIs for dynamically creating machine learning models
  • 23. • Programmatic creation of machine learning models • Large number of algorithms and recipes • Data transformation models included in the language Sophisticated ML Model Authoring
  • 24. • Sophisticated monitoring for evaluating ML models • Integrated with AWS Cloud Watch • KPIs that evaluate the efficiency of ML models Monitoring ML Model Execution
  • 25. • Optimized DSL for data transformation • Recipes that abstract common transformations • Reuse transformation recipes across ML models Embedded Data Transformation
  • 26. • Sophisticated monitoring for evaluating ML models • Integrated with AWS Cloud Watch • KPIs that evaluate the efficiency of ML models Monitoring ML Model Execution
  • 28. Databricks Machine Learning • Scaling Spark machine learning pipelines • Integrated data visualization tools • Sophisticated ML monitoring tools • Combine Python, Scala and R in a single platform
  • 29. • Implementing machine learning models using Notebooks • Publishing notebooks to a centralized catalog • Leverage Python, Scala or R to implement machine learning models Notebooks Based Authoring
  • 30. • Integrate data visualization into machine learning pipelines • Reuse data visualization notebooks across applications • Evaluate the efficiency of machine learning pipelines using visualizations Machine Learning Data Visualization
  • 31. • Monitor the execution of machine learning pipelines • Run machine learning pipelines manually • Rapidly modify and deploy machine learning pipelines Monitoring and Management
  • 33. • Personality Insights • Tradeoff Analytics • Relationship Extraction • Concept Insights • Speech to Text • Text to Speech • Visual Recognition • Natural Language Classifier • Language Identification • Language Translation • Question and Answer • Concept Expansion • Message Resonance • AlchemyAPI Services Large Variety of Cognitive Services
  • 34. • Access services via REST APIs • SDKs available for different languages • Integration with different services in the BlueMix platform Rich Developer Interfaces
  • 35. Relationship Extraction Concept Expansion Message Resonance User Modeling Complex Algorithms – Simple Interfaces
  • 36. Other Interesting Platforms • Microsoft’s Project Oxford https://www.projectoxford.ai/ • BigML https://bigml.com/
  • 38. The Leaders • Revolution Analytics (Microsoft) • Spark Mlib + Spark R • Dato • TensorFlow • Others: PredictionIO, Scikit-learn…
  • 40. All of Open Source R plus: • Big Data scalability • High-performance analytics • Development and deployment tools • Data source connectivity • Application integration framework • Multi-platform architecture • Support, Training and Services Revolution Analytics (Microsoft)
  • 41. DistributedR ScaleR ConnectR DeployR In the Cloud Amazon AWS Workstations & Servers Windows Red Hat and SUSE Linux Clustered Systems IBM Platform LSF Microsoft HPC EDW IBM Netezza Teradata Hadoop Hortonworks Cloudera Write Once, Deploy Anywhere
  • 42. DeployR does not provide any application UI. 3 integration modes embed real-time R results into existing interfaces Web app, mobile app, desktop app, BI tool, Excel, … RBroker Framework : Simple, high-performance API for Java, .NET and Javascript apps Supports transactional, on-demand analytics on a stateless R session Client Libraries: Flexible control of R services from Java, .NET and Javascript apps Also supports stateful R integrations (e.g. complex GUIs) DeployR Web Services API: Integrate R using almost any client languages Integrate R Scripts Into Third Party Applications
  • 43. Spark Mlib + SparkR
  • 44. • It is built on Apache Spark, a fast and general engine for large-scale data processing • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. • Write applications quickly in Java, Scala, or Python. Spark Mlib
  • 45. • Integrated with Spark SQL for data queries and transformations • Integrated with Spark GraphX for data visualizations • Integrated with Spark Streaming for real time data processing Beyond Machine Learning
  • 46. • Run R and machine learning models using the same infrastructure • Leverage R scripts from Spark Mlib models • Scale R models as part of a Spark cluster • Execute R models programmatically using Java APIs Spark Mlib + SparkR
  • 47. Dato
  • 48. • Makes Python machine learning enterprise – ready • Graphlab Create • Dato Distributed • Dato Predictive Services Dato
  • 51. Principles: • Get started fast • Rapidly iterate • Combine for new apps import graphlab as gl data = gl.SFrame.read_csv('my_data.csv') model = gl.recommender.create(data, user_id='user', item_id='moviez target='rating') recommendations = model.recommend(k=5) Recommender Image search Sentiment Analysis Data Matching Auto Tagging Churn Predictor Click Prediction Product Sentiment Object Detector Search Ranking Summarization … Sophisticated ML made easy - Toolkits
  • 53. • Powers deep learning capabilities on dozens of Google’s products • Interfaces for modeling machine and deep learning algorithms • Platform for executing those algorithms • Scales from mobile devices to a cluster with thousands of nodes • Has become one of the most popular projects in Guthub in less than a week Google’s Tensor Flow
  • 54. • Based on the principle of a dataflow graph • Nodes can perform data operations but also send or receive data • Python and C++ libraries. NodeJS, Go and others in the pipeline Tensorflow Programming Model cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv)) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) sess.run(tf.initialize_all_variables()) for i in range(20000): batch = mnist.train.next_batch(50) if i%100 == 0: train_accuracy = accuracy.eval(feed_dict={ x:batch[0], y_: batch[1], keep_prob: 1.0}) print "step %d, training accuracy %g"%(i, train_accuracy) train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) print "test accuracy %g"%accuracy.eval(feed_dict={ x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
  • 55. • Scales from a single device to a large cluster of nodes • Tensorflow uses a placement algorithm based on heuristics to place tasks on the different nodes in a graph • The execution engine assigns tasks for fault tolerance • Linear scalability model Tensor Flow Implementation
  • 56. • TensorFlow includes an engine that enables the visual representation of the execution graph • Visualizations include summary statistics of the different states of the model • The visualization engine is included in the current open source release Tensor Flow Graph Visualization
  • 57. Other Interesting Projects • H20.ai • PredictionIO • Scikit-Learn • Microsoft’s DMTK
  • 58. Machine Learning in the Enterprise
  • 59. •Enable foundational building blocks -Data quality -Data discovery -Functional and integration testing •Predictions are tempting but classification and clustering are easier •Run multiple models at once •Enable programmatic interfaces to interact with ML models •Start small, deliver quickly, iterate… Machine Learning in the Enterprise
  • 60. •Machine learning is becoming one of the most important elements of modern enterprise solutions •Innovation in machine learning is happening in both the on-premise and cloud space •Cloud machine learning innovators include: Azure ML, AWS ML, Databricks and IBM Watson •On-premise machine learning innovators include: Spark Mlib, Microsoft’s Revolution R, Dato, TensorFlow •Enterprise machine learning solutions should include elements such as data quality, data governance, etc •Start small and use real use cases Summary
  • 63. • Extensions to SciPy (Scientific Python) are called SciKits. SciKit-Learn provides machine learning algorithms. • Algorithms for supervised & unsupervised learning • Built on SciPy and Numpy • Standard Python API interface • Sits on top of c libraries, LAPACK, LibSVM, and Cython • Open Source: BSD License (part of Linux) • Probably the best general ML framework out there. Scikit-Learn
  • 64. Load & Transform Data Raw Data Feature Extraction Build Model Feature Evaluation Very Simple Prediction Model Evaluate Model
  • 65. Assess how model will generalize to independent data set (e.g. data not in the training set). 1. Divide data into training and test splits 2. Fit model on training, predict on test 3. Determine accuracy, precision and recall 4. Repeat k times with different splits then average as F1 Predicted Class A Predicted Class B Actual A True A False B #A Actual B False A True B #B #P(A) #P(B) total Simple Programming Model-Cross Validation (classification)
  • 66. How to evaluate clusters? Visualization (but only in 2D) Data Visualization
  • 68. • Developer friendly machine learning platform • Completely open source • Based on Apache Spark PredictionIO
  • 69. • PredictionIO platform A machine learning stack for building, evaluating and deploying engines with machine learning algorithms. • Event Server An open source machine learning analytics layer for unifying events from multiple platforms • Template Gallery engine templates for different type of machine learning applications A Simple Architecture
  • 70. • Execute models asynchronous via event interface • Query data programmatically via REST interface • Various SDKs provided as part of the platform Model Execution
  • 71. • Visual model for model creation • Integrated with a template gallery • Ability to test and valite engines Rich Model Creation Interface