SlideShare a Scribd company logo
Big Data to AI
Analytics Trends and Directions:
Cedrine Madera, PhD
Executive Information Architect
Member of IBM Academy Of Technology
Unleashing your data and making the shift to a
Data-Driven Organization
Value
Uses of Data
Efficiency Modernization Data Decision Monetization
Operations Reporting &
Data
Warehousing
Self-Service
Analytics
New
Business
Models
Data Science
Analytics maturity level
From information driven to data driven
BIG DATA, MACHINE LEARNING AND COGNITIVE/AI>
010101010101010111100010011001010111
1000101
1000101
1000101
111010111010
00000000000010101010100000000000 111101011
Cognitive
BUSINESS
VALUE
1990’s
DATA WAREHOUSE
2012
BIG DATA
2014
Data Lake
Store and analyse growing volumes of data to answer to analytics requirements- Information driven Systems
Integrate non structured data – Apache Hadoop experimentation -
hybrid information & data driven systems
To support digital transformation, data driven model
Strong analytics foundations to go to AI>
Information
systems
Velocity/ Variety / Volume
of Data
2017
Cognitif Information
System
2018
Infuse AI
Semantic
• Artificial Intelligence (AI)
• Intelligence exhibited by machines or software
• Machine Learning (ML)
• Type of AI that enables computers to learn without
being explicitly programmed
• Deep Learning (DL)
• Type of ML, based on neural networks loosely
modeled after the brain
• learns features and representations of data
• Training
• neural “inspired”, fed by millions of data points
• repetition drives weighting and connections
Cognitive Systems : A category of technologies that uses natural language
processing and machine learning to enable people and machines to interact more
naturally to extend and magnify human expertise and cognition.
These systems will learn and interact to provide expert assistance to scientists,
engineers, lawyers, and other professionals in a fraction of the time it now takes.
Machine Learning
Deep Learning
Break tasks into Artificial
Neural Networks
Advanced
Analytics:
NoSQL,
Hadoop &
Analytics
Human Intelligence Exhibited by Machines
Cognitive / AI
“Trained” using large amounts of data &
ability to learn how to perform the task
What the market is
saying…
https://www.forbes.com/sites/brentdykes/2017/01/11/crawl-with-analytics-before-running-with-artificial-intelligence/#61efd2f8299c
Ovum : 2017 Trends to Watch: Analytics
Machine learning and automation
is the enterprise reality of AI science fiction
“A market for algorithms will emerge..”
Upgrading data architectures must balance
new capabilities with existing investments
IDC
Crawl With Analytics Before Running With Artificial Intelligence
No Artificial Inteligence
without Information Architecture
The descriptive Analytics challenges
Functional
• Regulation & compliance (GDPR)
• Silos
• All data types
Non functional
• Scalability
• Reliability
• Security
• Data governance
• Data Gravity
Descriptive analytics can be classified into three areas that answer certain kinds of questions:
• Standard reporting and dashboards: What happened? How does it compare to our plan? What is happening now?
• Ad-hoc reporting: How many? How often? Where?
• Analysis/query/drill-down: What exactly is the problem? Why is it happening?
The Predictive Analytics challenges
Functional
• Information system
coverage extension
• Skills- open technologies
• Machine Learning
Non functional
• Volume
• Security
• transparency
Predictive analytics can be classified into six categories:
•Data mining: What data is correlated with other data?
•Pattern recognition and alerts: When should I take action to correct or adjust a process or piece of equipment?
•Monte-Carlo simulation: What could happen?
•Forecasting: What if these trends continue?
•Root cause analysis: Why did something happen?
•Predictive modeling: What will happen next if?
The Prescriptive Analytics challenges
Functional
•Business rules
automation
Non functional
•Real time
•Historical data volume
Prescriptive analytics, which is part of “advanced analytics,” is based on the concept of optimization, which can be
divided into two areas:
•Optimization: How can we achieve the best outcome?
•Stochastic optimization: How can we achieve the best outcome and address uncertainty in the data to make better
decisions?
The Data governance challenges
Functional
CDO- CPO
Ethics & Analytics
Regulations
Non functional
•Data Life cycle
•Data Security
•Data quality
Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of
the data employed in an enterprise.
The Data Architecture challenges
Functional
HTAP*
Data Lake
IoT
Non functional
• Volume
• Cost
• Data Security
• Data quality
• Real time
Data architecture is a set of rules, policies, standards and models that govern and define the type of data
collected and how it is used, stored, managed and integrated within an organization and its database systems.
It provides a formal approach to creating and managing the flow of data and how it is processed across an
organization’s IT systems and applications.
*Hybrid Transactional Analytical Processing
How the z Systems can help to solve
those challenges?
Analytics- Machine Learning-Data governance-Data architecture
The descriptive Analytics challenges
Accelerators
IBM DB2 Analytics Accelerator
DB2 BLU
DASHDB
SIMD
SMT
• Data movement – ETL
• INZA-predictive modelling
• Queries
• Open language R-Scala(Spark)
• Archives
• Federation
• DB2 z/OS- IMS-VSAM-Oracle
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
The Predictive Analytics challenges
Open Framework
Machine Learning
IBM SPSS
Apache Spark
IBM Machine Learning on
z/OS
R
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
Machine Learning Basics
Identifies patterns in
historical data
Builds/trains
behavioral models
from patterns
Makes
recommendations
Machine learning is everywhere, influencing nearly
everything we do…
Netflix personalized movie
recommendations
Waze personalized
driving experience 7 out of 10 financial customers would take
recommendations from a robot advisor
Machine Learning - Process
Data
Ingestion
Data Cleaning
and
Transformation
Model
Training
Testing and
Validation
Deployment
Model Selection
From experimentation to production… the real data science challenge
Machine
Learning can be
applied to a
Variety of Use
Cases
Across Problem
Types and
Industries
Machine learning can help IT department… batch optimization, predictive maintenance/failure,….
be embeded into any expert System.
The Data governance challenges
Move analytics power &
security to data
Ethics framework into
Analytics project
HW accelerator
Memory extended
zIIP eligibility
Zero cost – Zero latency for IDAA
Apache Spark
Pervasive encryption
MDM
Machine Learning
Privacy by design and by default
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
The Analytic’s
Ethics dilemma
with personal
data : how GDPR
could slow down
Analytics project
New Analytics or Machine Learning projects will required
Ethical policies by design and by default.
The importance of Ethical
dimension with Analytics and
Machine Learning projects
Recommendations for GDPR readiness with
Analytics and Machine learning projects
• Check if personal data is processed into big data analytics treatment and should consider to
use appropriate techniques to anonymize the personal data in their dataset(s) before
analysis...
• Become transparent about their processing of personal data by using a combination of
innovative approaches in order to provide meaningful privacy notices at appropriate stages
throughout a big data project.
• Embed a privacy impact assessment framework into their big data processing activities to
help identify privacy risks and assess the necessity and proportionality of a given project.
• Adopt a privacy by design approach in the development and application of their big data
analytics. This should include implementing technical and organizational measures to
address matters including data security, data minimization and data segregation...
• Develop ethical principles to help reinforce key data protection principles. Organizations
should create ethics boards to help scrutinize projects and assess complex issues arising
from big data analytics...
• Implement innovative techniques to develop auditable machine learning algorithms.
Internal and external audits should be undertaken with a view to explaining the rationale
behind algorithmic decisions and checking for bias, discrimination and errors...
The Data Architecture challenges
Federated data lake
Hybrid cloud integration
IDAA
Apache Spark
DashDB
Linux on z
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
Reasons to limit data movement to build a
physical data lake
Data gravity – analytic
treatment move where the
data resides
Data sensitivity – To crypt data
in case of data breach
Real time analytics
requirements
Data governance high
requirements :
•Data quality : reduce data copy
•Data security : regulations ( such as
GDPR)
•Data life cycle management : alleviate
and optimize data management
The hybrid data lake federated approach
To alleviate data
movement
To use
federated data
approach
To respect
data gravity
To leverage
existing data
set
To limit data
discrepancy
Use z Systems as
one of physical repository
Let z Systems data
In place
Show to your data scientists
How easy it is to access z data
Imperatives to implement Data Lake hybrid
scenario
Reduce complexity of
information supply chain, e.g.
• Avoid data movement
• Simplify data transformation
• Use in-DB transformation
• Use temporary tables structures
Adhere to innovative and
novel Analytics concepts, e.g.
• Limit number of data marts and data
cubes
• Use aggregation on the fly
• Allow for agile usage patterns
• Leverage HTAP* architecture
Technologies to use for hybrid data lake
approach
Leverage state-of-the-art technology,
e.g.
HW accelerators
Special-purpose appliances
In-memory processing
Use federation technique whenever
possible, e.g.
Federated SQL queries, leaving data in
place
Federated analytical processing,
leaving data in place
Open Framework (e.g Apache Spark)
*Hybrid Transactional Analytical Processing
Data in IBM DB2 Analytics Accelerator
• An extension of a DB2 for z/OS system
• ETL process acceleration and alleviation
• Accelerating SQL access to z/OS data, including
IMS, VSAM ... loaded by IDAA Loader
• Managing huge volume of history data (HPSS )
• R queries accelerator
• Apache Spark on z/OS queries accelerator
Transparent and easy data scientists
access
• Thru JDBC or API from Spark on distributed
including Linux on z
With Spark on z/OS as well as Machine
Learning on z/OS
z Systems as a Data Lake Repository into an
hybrid approach- make z Data Simple
Descriptive
Predictive
Prescriptive
Data architecture
Data governance
Technology breath with IBM Z
Ask your Information Architect
to leverage them!
Wrap up of the presentation
Analytics
From information driven to data driven , IBM Z can help to achieve the challenge !
Thank you
Cedrine Madera, PhD
Executive Information Architect
Member of IBM Academy Of Technology

More Related Content

Gse uk-cedrinemadera-2018-shared

  • 1. Big Data to AI Analytics Trends and Directions: Cedrine Madera, PhD Executive Information Architect Member of IBM Academy Of Technology
  • 2. Unleashing your data and making the shift to a Data-Driven Organization Value Uses of Data Efficiency Modernization Data Decision Monetization Operations Reporting & Data Warehousing Self-Service Analytics New Business Models Data Science Analytics maturity level From information driven to data driven
  • 3. BIG DATA, MACHINE LEARNING AND COGNITIVE/AI> 010101010101010111100010011001010111 1000101 1000101 1000101 111010111010 00000000000010101010100000000000 111101011
  • 4. Cognitive BUSINESS VALUE 1990’s DATA WAREHOUSE 2012 BIG DATA 2014 Data Lake Store and analyse growing volumes of data to answer to analytics requirements- Information driven Systems Integrate non structured data – Apache Hadoop experimentation - hybrid information & data driven systems To support digital transformation, data driven model Strong analytics foundations to go to AI> Information systems Velocity/ Variety / Volume of Data 2017 Cognitif Information System 2018 Infuse AI
  • 5. Semantic • Artificial Intelligence (AI) • Intelligence exhibited by machines or software • Machine Learning (ML) • Type of AI that enables computers to learn without being explicitly programmed • Deep Learning (DL) • Type of ML, based on neural networks loosely modeled after the brain • learns features and representations of data • Training • neural “inspired”, fed by millions of data points • repetition drives weighting and connections Cognitive Systems : A category of technologies that uses natural language processing and machine learning to enable people and machines to interact more naturally to extend and magnify human expertise and cognition. These systems will learn and interact to provide expert assistance to scientists, engineers, lawyers, and other professionals in a fraction of the time it now takes. Machine Learning Deep Learning Break tasks into Artificial Neural Networks Advanced Analytics: NoSQL, Hadoop & Analytics Human Intelligence Exhibited by Machines Cognitive / AI “Trained” using large amounts of data & ability to learn how to perform the task
  • 6. What the market is saying… https://www.forbes.com/sites/brentdykes/2017/01/11/crawl-with-analytics-before-running-with-artificial-intelligence/#61efd2f8299c Ovum : 2017 Trends to Watch: Analytics Machine learning and automation is the enterprise reality of AI science fiction “A market for algorithms will emerge..” Upgrading data architectures must balance new capabilities with existing investments IDC Crawl With Analytics Before Running With Artificial Intelligence
  • 7. No Artificial Inteligence without Information Architecture
  • 8. The descriptive Analytics challenges Functional • Regulation & compliance (GDPR) • Silos • All data types Non functional • Scalability • Reliability • Security • Data governance • Data Gravity Descriptive analytics can be classified into three areas that answer certain kinds of questions: • Standard reporting and dashboards: What happened? How does it compare to our plan? What is happening now? • Ad-hoc reporting: How many? How often? Where? • Analysis/query/drill-down: What exactly is the problem? Why is it happening?
  • 9. The Predictive Analytics challenges Functional • Information system coverage extension • Skills- open technologies • Machine Learning Non functional • Volume • Security • transparency Predictive analytics can be classified into six categories: •Data mining: What data is correlated with other data? •Pattern recognition and alerts: When should I take action to correct or adjust a process or piece of equipment? •Monte-Carlo simulation: What could happen? •Forecasting: What if these trends continue? •Root cause analysis: Why did something happen? •Predictive modeling: What will happen next if?
  • 10. The Prescriptive Analytics challenges Functional •Business rules automation Non functional •Real time •Historical data volume Prescriptive analytics, which is part of “advanced analytics,” is based on the concept of optimization, which can be divided into two areas: •Optimization: How can we achieve the best outcome? •Stochastic optimization: How can we achieve the best outcome and address uncertainty in the data to make better decisions?
  • 11. The Data governance challenges Functional CDO- CPO Ethics & Analytics Regulations Non functional •Data Life cycle •Data Security •Data quality Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise.
  • 12. The Data Architecture challenges Functional HTAP* Data Lake IoT Non functional • Volume • Cost • Data Security • Data quality • Real time Data architecture is a set of rules, policies, standards and models that govern and define the type of data collected and how it is used, stored, managed and integrated within an organization and its database systems. It provides a formal approach to creating and managing the flow of data and how it is processed across an organization’s IT systems and applications. *Hybrid Transactional Analytical Processing
  • 13. How the z Systems can help to solve those challenges? Analytics- Machine Learning-Data governance-Data architecture
  • 14. The descriptive Analytics challenges Accelerators IBM DB2 Analytics Accelerator DB2 BLU DASHDB SIMD SMT • Data movement – ETL • INZA-predictive modelling • Queries • Open language R-Scala(Spark) • Archives • Federation • DB2 z/OS- IMS-VSAM-Oracle Technology breath : To simply- To alleviate- To secure Data gravity : volume-sensitivity-cost HTAP enablement
  • 15. The Predictive Analytics challenges Open Framework Machine Learning IBM SPSS Apache Spark IBM Machine Learning on z/OS R Technology breath : To simply- To alleviate- To secure Data gravity : volume-sensitivity-cost HTAP enablement
  • 16. Machine Learning Basics Identifies patterns in historical data Builds/trains behavioral models from patterns Makes recommendations Machine learning is everywhere, influencing nearly everything we do… Netflix personalized movie recommendations Waze personalized driving experience 7 out of 10 financial customers would take recommendations from a robot advisor
  • 17. Machine Learning - Process Data Ingestion Data Cleaning and Transformation Model Training Testing and Validation Deployment Model Selection From experimentation to production… the real data science challenge
  • 18. Machine Learning can be applied to a Variety of Use Cases Across Problem Types and Industries Machine learning can help IT department… batch optimization, predictive maintenance/failure,…. be embeded into any expert System.
  • 19. The Data governance challenges Move analytics power & security to data Ethics framework into Analytics project HW accelerator Memory extended zIIP eligibility Zero cost – Zero latency for IDAA Apache Spark Pervasive encryption MDM Machine Learning Privacy by design and by default Technology breath : To simply- To alleviate- To secure Data gravity : volume-sensitivity-cost HTAP enablement
  • 20. The Analytic’s Ethics dilemma with personal data : how GDPR could slow down Analytics project New Analytics or Machine Learning projects will required Ethical policies by design and by default.
  • 21. The importance of Ethical dimension with Analytics and Machine Learning projects
  • 22. Recommendations for GDPR readiness with Analytics and Machine learning projects • Check if personal data is processed into big data analytics treatment and should consider to use appropriate techniques to anonymize the personal data in their dataset(s) before analysis... • Become transparent about their processing of personal data by using a combination of innovative approaches in order to provide meaningful privacy notices at appropriate stages throughout a big data project. • Embed a privacy impact assessment framework into their big data processing activities to help identify privacy risks and assess the necessity and proportionality of a given project. • Adopt a privacy by design approach in the development and application of their big data analytics. This should include implementing technical and organizational measures to address matters including data security, data minimization and data segregation... • Develop ethical principles to help reinforce key data protection principles. Organizations should create ethics boards to help scrutinize projects and assess complex issues arising from big data analytics... • Implement innovative techniques to develop auditable machine learning algorithms. Internal and external audits should be undertaken with a view to explaining the rationale behind algorithmic decisions and checking for bias, discrimination and errors...
  • 23. The Data Architecture challenges Federated data lake Hybrid cloud integration IDAA Apache Spark DashDB Linux on z Technology breath : To simply- To alleviate- To secure Data gravity : volume-sensitivity-cost HTAP enablement
  • 24. Reasons to limit data movement to build a physical data lake Data gravity – analytic treatment move where the data resides Data sensitivity – To crypt data in case of data breach Real time analytics requirements Data governance high requirements : •Data quality : reduce data copy •Data security : regulations ( such as GDPR) •Data life cycle management : alleviate and optimize data management
  • 25. The hybrid data lake federated approach To alleviate data movement To use federated data approach To respect data gravity To leverage existing data set To limit data discrepancy Use z Systems as one of physical repository Let z Systems data In place Show to your data scientists How easy it is to access z data
  • 26. Imperatives to implement Data Lake hybrid scenario Reduce complexity of information supply chain, e.g. • Avoid data movement • Simplify data transformation • Use in-DB transformation • Use temporary tables structures Adhere to innovative and novel Analytics concepts, e.g. • Limit number of data marts and data cubes • Use aggregation on the fly • Allow for agile usage patterns • Leverage HTAP* architecture
  • 27. Technologies to use for hybrid data lake approach Leverage state-of-the-art technology, e.g. HW accelerators Special-purpose appliances In-memory processing Use federation technique whenever possible, e.g. Federated SQL queries, leaving data in place Federated analytical processing, leaving data in place Open Framework (e.g Apache Spark) *Hybrid Transactional Analytical Processing
  • 28. Data in IBM DB2 Analytics Accelerator • An extension of a DB2 for z/OS system • ETL process acceleration and alleviation • Accelerating SQL access to z/OS data, including IMS, VSAM ... loaded by IDAA Loader • Managing huge volume of history data (HPSS ) • R queries accelerator • Apache Spark on z/OS queries accelerator Transparent and easy data scientists access • Thru JDBC or API from Spark on distributed including Linux on z With Spark on z/OS as well as Machine Learning on z/OS z Systems as a Data Lake Repository into an hybrid approach- make z Data Simple
  • 29. Descriptive Predictive Prescriptive Data architecture Data governance Technology breath with IBM Z Ask your Information Architect to leverage them! Wrap up of the presentation Analytics From information driven to data driven , IBM Z can help to achieve the challenge !
  • 30. Thank you Cedrine Madera, PhD Executive Information Architect Member of IBM Academy Of Technology