SlideShare a Scribd company logo
 
WSO2	
  Machine	
  Learner	
  
1.1.0	
  
	
  
	
  
	
  
WSO2	
  Analy+cs	
  Pla/orm	
  
	
  
	
  
WSO2	
  Analy5cs	
  Pla8orm	
  uniquely	
  combines	
  simultaneous	
  real-­‐
2me	
  and	
  batch	
  analysis	
  with	
  predic2ve	
  analy2cs	
  to	
  turn	
  data	
  
from	
  IoT,	
  mobile	
  and	
  Web	
  apps	
  into	
  ac5onable	
  insights	
  
	
  
2	
  
WSO2	
  Analy+cs	
  Pla/orm	
  
3	
  
WSO2	
  Advantages	
  
4	
  
Highly	
  Pluggable	
  Architecture	
  
5	
  
Toolboxes	
  for	
  Extensibility	
  
6	
  
+	
   Toolboxes	
  =	
  	
   Industry	
  or	
  domain	
  specific	
  analy7cs	
  
Toolboxes:	
  	
  
•  Fraud	
  and	
  Anomaly	
  Detec+on-­‐	
  	
  Supports	
  fraud	
  and	
  anomaly	
  detec7on	
  through	
  sta7c	
  	
  rules,	
  Markov	
  
chains,	
  and	
  scoring.	
  
•  GIS	
  Data	
  Monitoring	
  -­‐	
  Can	
  take	
  any	
  data	
  stream	
  tagged	
  with	
  geographical	
  loca7ons	
  and	
  support	
  
visualiza7ons	
  of	
  that	
  data	
  in	
  a	
  map.	
  
•  Ac+vity	
  Monitoring-­‐	
  Lets	
  users	
  correlate	
  events	
  related	
  to	
  the	
  same	
  transac7on	
  in	
  order	
  to	
  visualize,	
  
analyze,	
  and	
  write	
  queries	
  on	
  top	
  of	
  those	
  ac7vi7es.	
  
Edge	
  Analy+cs-­‐Mobile	
  and	
  IoT	
  Streams	
  
7	
  
Event	
  correla2on/filtering	
  available	
  at	
  the	
  edge	
  
High	
  Level	
  Languages	
  
•  For	
  both	
  batch	
  and	
  real-­‐7me,	
  we	
  provide	
  structured	
  ,	
  SQL-­‐like	
  query	
  languages.	
  
•  No	
  Java	
  programming	
  is	
  required	
  
•  Lowers	
  the	
  adop7on	
  entry	
  point.	
  
•  Batch	
  analy7cs	
  relies	
  on	
  SparkSQL.	
  
•  Real	
  Time	
  analy7cs	
  implemented	
  through	
  WSO2	
  owned	
  solu7on	
  Siddhi	
  
8	
  
Real+me	
  analy+cs	
  with	
  Siddhi	
  
•  ThroRling	
  &	
  Blacklis7ng	
  users	
  
define	
  stream	
  RequestStream	
  (	
  correla7onID	
  string,	
  serviceID	
  string,userID	
  string,	
  tear	
  
string,	
  requestTime	
  long,	
  ...	
  )	
  ;	
  
define	
  table	
  BlacklistedUserTable(userID	
  string,7me	
  long,requestCount	
  long);	
  	
  
from	
  RequestStream[tear==‘BRONZE’]#window.7me(1	
  min)	
  
select	
  userID,	
  requestTime	
  as	
  7me,	
  count(correla7onID)	
  as	
  requestCount	
  
group	
  by	
  userID	
  
having	
  up	
  requestCount	
  >	
  5	
  
insert	
  into	
  BlacklistedUserTable	
  ;	
  
9	
  
Batch	
  Analy+cs	
  with	
  Spark	
  SQL	
  	
  
create temporary table product_data using carbonanalytics
options (schema …)
create temporary table products using carbonanalytics
options (schema …)
insert into products select product_name from product_data
group by …
10	
  
Case	
  Studies	
  
1
Smart	
  Home	
  
•  DEBS	
  (Distributed	
  Event	
  Based	
  Systems)	
  is	
  a	
  premier	
  academic	
  
conference,	
  which	
  post	
  yearly	
  event	
  processing	
  challenge	
  (
hRp://www.cse.iitb.ac.in/debs2014/?page_id=42)	
  	
  
•  Smart	
  Home	
  electricity	
  data:	
  2000	
  sensors,	
  40	
  houses,	
  4	
  Billion	
  events	
  
•  We	
  posted	
  fastest	
  single	
  node	
  solu7on	
  measured	
  (400K	
  events/sec)	
  
and	
  close	
  to	
  one	
  million	
  distributed	
  throughput.	
  	
  
•  WSO2	
  CEP	
  based	
  solu7on	
  is	
  one	
  of	
  the	
  four	
  finalists	
  (with	
  Dresden	
  
University	
  of	
  Technology,	
  Fraunhofer	
  Ins7tute,	
  and	
  Imperial	
  College	
  
London)	
  
•  Only	
  generic	
  solu7on	
  to	
  become	
  a	
  finalist	
  
12	
  
Healthcare	
  Data	
  Monitoring	
  
•  Allows	
  to	
  search/visualize/analyze	
  healthcare	
  records	
  (HL7)	
  	
  across	
  20	
  hospitals	
  in	
  
Italy	
  
•  Used	
  in	
  combina7on	
  with	
  WSO2	
  ESB	
  
•  Custom	
  toolbox	
  tailored	
  to	
  customer’s	
  requirement	
  (	
  to	
  replace	
  exis7ng	
  system)	
  
	
  
	
  
	
  
•  	
  	
  
13	
  
Cloud	
  IDE	
  Analy+cs	
  
•  Custom	
  solu7on	
  created	
  in	
  partnership	
  with	
  Codenvy	
  to	
  bring	
  analy7cs	
  to	
  Codenvy	
  
management	
  team	
  and	
  its	
  customers	
  
•  Developed	
  in	
  less	
  than	
  a	
  month,	
  with	
  a	
  custom	
  plug-­‐in	
  to	
  MongoDB.	
  
•  Deployed	
  in	
  the	
  codenvy.com	
  plamorm.	
  
14	
  
Addi+onal	
  Customers	
  Use	
  Cases	
  
	
  
•  Cisco	
  (BAM	
  +	
  CEP)	
  -­‐	
  OEM,	
  Healthcare,	
  Parking	
  Monitoring	
  (see	
  Solu7on	
  paRerns	
  based	
  
approach	
  to	
  rapidly	
  create	
  IoE	
  solu7ons	
  across	
  industries,	
  	
  
•  hRp://us14.wso2con.com/videos/#Coumara-­‐Radja	
  
•  Used	
  by	
  a	
  Large	
  Scale	
  IoT	
  System	
  Provider	
  for	
  use	
  cases	
  including	
  Vehicle	
  tracking,	
  	
  Smart	
  
City,	
  Building	
  Monitoring	
  (CEP)	
  
•  See	
  “Internet	
  of	
  Big	
  Things:	
  The	
  Story	
  of	
  Pacific	
  Controls,	
  hRp://us14.wso2con.com/videos/#Sajaad-­‐Chaudry”	
  	
  
•  Transac7on	
  Monitoring	
  in	
  a	
  Large	
  Bank	
  (CEP)	
  
•  Knowledge	
  Mining	
  and	
  tracking	
  Prospec7ve	
  Customers	
  through	
  Natural	
  Language	
  data	
  
sources	
  (CEP)	
  
•  CEP	
  Embedded	
  in	
  edge	
  Devices	
  	
  
•  See	
  WSO2Con	
  2013	
  -­‐	
  Keynote:Emerging	
  Founda7ons	
  of	
  Next-­‐Genera7on	
  Business	
  Systems	
  
hRps://www.youtube.com/watch?v=7CyG3JKUxWw	
  
•  ThroRling	
  and	
  Anomaly	
  Detec7on	
  by	
  Group	
  of	
  Telecom	
  Companies	
  	
  
15	
  
WSO2	
  Machine	
  Learner	
  
	
  (Technical	
  Overview)	
  
	
  
1
WSO2	
  Machine	
  Learner	
  
17	
  
Overview	
  
18	
  
o  Open source Machine Learning (ML) tool
o  Scalable way to perform machine learning
o  Visually explore uploaded data sets
o  Support for various machine learning algorithms
o  Metrics to evaluate and compare built ML models.
o  Ability to export ML models
o  Extensions for real-time predictions
o  REST API to expose all features i.e. ML jobs are scriptable
Func+onality	
  
19	
  
o  Manage and explore your data
o  Analyze the data using machine learning algorithms
o  Build machine learning models
o  Compare and manage generated machine learning models
o  Predict using the built models
Manage	
  Data	
  set	
  
20	
  
o  Supported data sources
o CSV/TSV files from local file systems.
o Files from HDFS.
o Tables from WSO2 Data Analytics Server
o  Supports data set versioning.
o Version data collected overtime from the same data set
o  Generate models from the different versions.
o  Manage datasets based on projects ,users.
Pre-­‐process	
  &	
  Explore	
  Data	
  
21	
  
o  Find key details from feature set
o  Scatter plots to understand
relationship between feature set
o Supported graphs:
o Scatter plots, Parallel sets,Trellis
charts, Cluster diagram, Histogram
o  Missing value handling with
mean imputation and discard
Analysis	
  with	
  ML	
  Algorithm	
  
22	
  
o  Supports deep learning
o  Supports supervised and unsupervised learning.
o  Includes algorithms for numerical prediction, classification
and clustering.
o  Supports anomaly detection algorithm.
o  Supports recommendation with Collaborative Filtering
Recommendation Algorithm
Analysis	
  with	
  ML	
  Algorithm	
  
23	
  
o  Includes algorithms for numerical prediction, classification
and clustering.
Numerical
prediction
Linear Regression, Ridge
Regression, Lasso Regression
Classification Logistic Regression, Naive Bayes,
Decision Tree, Random Forest and
Support Vector Machines
Clustering K-Means
Model	
  Evalua+on	
  &	
  Comparison	
  
24	
  
o  Evaluate generated models
based on metrics
o Accuracy
o Area under ROC curve
o Confusion Matrix
o Predicted vs. Actual graphs
o Feature importance
o  Compare models generated
from different analysis.
o  Set fractions for training data
Integra+on	
  of	
  ML	
  Models	
  
25	
  
o  Models can be used via
main transaction flow
(WSO2 ESB) or data
analysis flow (WSO2 CEP)
o  Supports PMML for
interoperability.
Deployment	
  Op+ons	
  
26	
  
o  Stand alone mode
o  With external Spark
Cluster
o  With WSO2 DAS as
external Spark Cluster
Run	
  Yourself	
  or	
  let	
  WSO2	
  Run	
  it	
  for	
  you	
  
27	
  
Self-Hosted
•  Your operations team maintains the
deployment with production support from
WSO2
WSO2 Managed Cloud
•  WSO2 Operations team runs the
deployment in a dedicated environment in
AWS datacenter of your choice
•  Includes monitoring, backups, patches,
updates
•  Financially backed SLA on uptime and
response time
Thank	
  You!	
  
Download	
  WSO2	
  Machine	
  Learner	
  at:	
  	
  
h]p://wso2.com/products/machine-­‐learner/	
  

More Related Content

WSO2 Machine Learner - Product Overview

  • 1.   WSO2  Machine  Learner   1.1.0        
  • 2. WSO2  Analy+cs  Pla/orm       WSO2  Analy5cs  Pla8orm  uniquely  combines  simultaneous  real-­‐ 2me  and  batch  analysis  with  predic2ve  analy2cs  to  turn  data   from  IoT,  mobile  and  Web  apps  into  ac5onable  insights     2  
  • 6. Toolboxes  for  Extensibility   6   +   Toolboxes  =     Industry  or  domain  specific  analy7cs   Toolboxes:     •  Fraud  and  Anomaly  Detec+on-­‐    Supports  fraud  and  anomaly  detec7on  through  sta7c    rules,  Markov   chains,  and  scoring.   •  GIS  Data  Monitoring  -­‐  Can  take  any  data  stream  tagged  with  geographical  loca7ons  and  support   visualiza7ons  of  that  data  in  a  map.   •  Ac+vity  Monitoring-­‐  Lets  users  correlate  events  related  to  the  same  transac7on  in  order  to  visualize,   analyze,  and  write  queries  on  top  of  those  ac7vi7es.  
  • 7. Edge  Analy+cs-­‐Mobile  and  IoT  Streams   7   Event  correla2on/filtering  available  at  the  edge  
  • 8. High  Level  Languages   •  For  both  batch  and  real-­‐7me,  we  provide  structured  ,  SQL-­‐like  query  languages.   •  No  Java  programming  is  required   •  Lowers  the  adop7on  entry  point.   •  Batch  analy7cs  relies  on  SparkSQL.   •  Real  Time  analy7cs  implemented  through  WSO2  owned  solu7on  Siddhi   8  
  • 9. Real+me  analy+cs  with  Siddhi   •  ThroRling  &  Blacklis7ng  users   define  stream  RequestStream  (  correla7onID  string,  serviceID  string,userID  string,  tear   string,  requestTime  long,  ...  )  ;   define  table  BlacklistedUserTable(userID  string,7me  long,requestCount  long);     from  RequestStream[tear==‘BRONZE’]#window.7me(1  min)   select  userID,  requestTime  as  7me,  count(correla7onID)  as  requestCount   group  by  userID   having  up  requestCount  >  5   insert  into  BlacklistedUserTable  ;   9  
  • 10. Batch  Analy+cs  with  Spark  SQL     create temporary table product_data using carbonanalytics options (schema …) create temporary table products using carbonanalytics options (schema …) insert into products select product_name from product_data group by … 10  
  • 12. Smart  Home   •  DEBS  (Distributed  Event  Based  Systems)  is  a  premier  academic   conference,  which  post  yearly  event  processing  challenge  ( hRp://www.cse.iitb.ac.in/debs2014/?page_id=42)     •  Smart  Home  electricity  data:  2000  sensors,  40  houses,  4  Billion  events   •  We  posted  fastest  single  node  solu7on  measured  (400K  events/sec)   and  close  to  one  million  distributed  throughput.     •  WSO2  CEP  based  solu7on  is  one  of  the  four  finalists  (with  Dresden   University  of  Technology,  Fraunhofer  Ins7tute,  and  Imperial  College   London)   •  Only  generic  solu7on  to  become  a  finalist   12  
  • 13. Healthcare  Data  Monitoring   •  Allows  to  search/visualize/analyze  healthcare  records  (HL7)    across  20  hospitals  in   Italy   •  Used  in  combina7on  with  WSO2  ESB   •  Custom  toolbox  tailored  to  customer’s  requirement  (  to  replace  exis7ng  system)         •      13  
  • 14. Cloud  IDE  Analy+cs   •  Custom  solu7on  created  in  partnership  with  Codenvy  to  bring  analy7cs  to  Codenvy   management  team  and  its  customers   •  Developed  in  less  than  a  month,  with  a  custom  plug-­‐in  to  MongoDB.   •  Deployed  in  the  codenvy.com  plamorm.   14  
  • 15. Addi+onal  Customers  Use  Cases     •  Cisco  (BAM  +  CEP)  -­‐  OEM,  Healthcare,  Parking  Monitoring  (see  Solu7on  paRerns  based   approach  to  rapidly  create  IoE  solu7ons  across  industries,     •  hRp://us14.wso2con.com/videos/#Coumara-­‐Radja   •  Used  by  a  Large  Scale  IoT  System  Provider  for  use  cases  including  Vehicle  tracking,    Smart   City,  Building  Monitoring  (CEP)   •  See  “Internet  of  Big  Things:  The  Story  of  Pacific  Controls,  hRp://us14.wso2con.com/videos/#Sajaad-­‐Chaudry”     •  Transac7on  Monitoring  in  a  Large  Bank  (CEP)   •  Knowledge  Mining  and  tracking  Prospec7ve  Customers  through  Natural  Language  data   sources  (CEP)   •  CEP  Embedded  in  edge  Devices     •  See  WSO2Con  2013  -­‐  Keynote:Emerging  Founda7ons  of  Next-­‐Genera7on  Business  Systems   hRps://www.youtube.com/watch?v=7CyG3JKUxWw   •  ThroRling  and  Anomaly  Detec7on  by  Group  of  Telecom  Companies     15  
  • 16. WSO2  Machine  Learner    (Technical  Overview)     1
  • 18. Overview   18   o  Open source Machine Learning (ML) tool o  Scalable way to perform machine learning o  Visually explore uploaded data sets o  Support for various machine learning algorithms o  Metrics to evaluate and compare built ML models. o  Ability to export ML models o  Extensions for real-time predictions o  REST API to expose all features i.e. ML jobs are scriptable
  • 19. Func+onality   19   o  Manage and explore your data o  Analyze the data using machine learning algorithms o  Build machine learning models o  Compare and manage generated machine learning models o  Predict using the built models
  • 20. Manage  Data  set   20   o  Supported data sources o CSV/TSV files from local file systems. o Files from HDFS. o Tables from WSO2 Data Analytics Server o  Supports data set versioning. o Version data collected overtime from the same data set o  Generate models from the different versions. o  Manage datasets based on projects ,users.
  • 21. Pre-­‐process  &  Explore  Data   21   o  Find key details from feature set o  Scatter plots to understand relationship between feature set o Supported graphs: o Scatter plots, Parallel sets,Trellis charts, Cluster diagram, Histogram o  Missing value handling with mean imputation and discard
  • 22. Analysis  with  ML  Algorithm   22   o  Supports deep learning o  Supports supervised and unsupervised learning. o  Includes algorithms for numerical prediction, classification and clustering. o  Supports anomaly detection algorithm. o  Supports recommendation with Collaborative Filtering Recommendation Algorithm
  • 23. Analysis  with  ML  Algorithm   23   o  Includes algorithms for numerical prediction, classification and clustering. Numerical prediction Linear Regression, Ridge Regression, Lasso Regression Classification Logistic Regression, Naive Bayes, Decision Tree, Random Forest and Support Vector Machines Clustering K-Means
  • 24. Model  Evalua+on  &  Comparison   24   o  Evaluate generated models based on metrics o Accuracy o Area under ROC curve o Confusion Matrix o Predicted vs. Actual graphs o Feature importance o  Compare models generated from different analysis. o  Set fractions for training data
  • 25. Integra+on  of  ML  Models   25   o  Models can be used via main transaction flow (WSO2 ESB) or data analysis flow (WSO2 CEP) o  Supports PMML for interoperability.
  • 26. Deployment  Op+ons   26   o  Stand alone mode o  With external Spark Cluster o  With WSO2 DAS as external Spark Cluster
  • 27. Run  Yourself  or  let  WSO2  Run  it  for  you   27   Self-Hosted •  Your operations team maintains the deployment with production support from WSO2 WSO2 Managed Cloud •  WSO2 Operations team runs the deployment in a dedicated environment in AWS datacenter of your choice •  Includes monitoring, backups, patches, updates •  Financially backed SLA on uptime and response time
  • 28. Thank  You!   Download  WSO2  Machine  Learner  at:     h]p://wso2.com/products/machine-­‐learner/