Gse uk-cedrinemadera-2018-shared
- 1. Big Data to AI
Analytics Trends and Directions:
Cedrine Madera, PhD
Executive Information Architect
Member of IBM Academy Of Technology
- 2. Unleashing your data and making the shift to a
Data-Driven Organization
Value
Uses of Data
Efficiency Modernization Data Decision Monetization
Operations Reporting &
Data
Warehousing
Self-Service
Analytics
New
Business
Models
Data Science
Analytics maturity level
From information driven to data driven
- 3. BIG DATA, MACHINE LEARNING AND COGNITIVE/AI>
010101010101010111100010011001010111
1000101
1000101
1000101
111010111010
00000000000010101010100000000000 111101011
- 4. Cognitive
BUSINESS
VALUE
1990’s
DATA WAREHOUSE
2012
BIG DATA
2014
Data Lake
Store and analyse growing volumes of data to answer to analytics requirements- Information driven Systems
Integrate non structured data – Apache Hadoop experimentation -
hybrid information & data driven systems
To support digital transformation, data driven model
Strong analytics foundations to go to AI>
Information
systems
Velocity/ Variety / Volume
of Data
2017
Cognitif Information
System
2018
Infuse AI
- 5. Semantic
• Artificial Intelligence (AI)
• Intelligence exhibited by machines or software
• Machine Learning (ML)
• Type of AI that enables computers to learn without
being explicitly programmed
• Deep Learning (DL)
• Type of ML, based on neural networks loosely
modeled after the brain
• learns features and representations of data
• Training
• neural “inspired”, fed by millions of data points
• repetition drives weighting and connections
Cognitive Systems : A category of technologies that uses natural language
processing and machine learning to enable people and machines to interact more
naturally to extend and magnify human expertise and cognition.
These systems will learn and interact to provide expert assistance to scientists,
engineers, lawyers, and other professionals in a fraction of the time it now takes.
Machine Learning
Deep Learning
Break tasks into Artificial
Neural Networks
Advanced
Analytics:
NoSQL,
Hadoop &
Analytics
Human Intelligence Exhibited by Machines
Cognitive / AI
“Trained” using large amounts of data &
ability to learn how to perform the task
- 6. What the market is
saying…
https://www.forbes.com/sites/brentdykes/2017/01/11/crawl-with-analytics-before-running-with-artificial-intelligence/#61efd2f8299c
Ovum : 2017 Trends to Watch: Analytics
Machine learning and automation
is the enterprise reality of AI science fiction
“A market for algorithms will emerge..”
Upgrading data architectures must balance
new capabilities with existing investments
IDC
Crawl With Analytics Before Running With Artificial Intelligence
- 8. The descriptive Analytics challenges
Functional
• Regulation & compliance (GDPR)
• Silos
• All data types
Non functional
• Scalability
• Reliability
• Security
• Data governance
• Data Gravity
Descriptive analytics can be classified into three areas that answer certain kinds of questions:
• Standard reporting and dashboards: What happened? How does it compare to our plan? What is happening now?
• Ad-hoc reporting: How many? How often? Where?
• Analysis/query/drill-down: What exactly is the problem? Why is it happening?
- 9. The Predictive Analytics challenges
Functional
• Information system
coverage extension
• Skills- open technologies
• Machine Learning
Non functional
• Volume
• Security
• transparency
Predictive analytics can be classified into six categories:
•Data mining: What data is correlated with other data?
•Pattern recognition and alerts: When should I take action to correct or adjust a process or piece of equipment?
•Monte-Carlo simulation: What could happen?
•Forecasting: What if these trends continue?
•Root cause analysis: Why did something happen?
•Predictive modeling: What will happen next if?
- 10. The Prescriptive Analytics challenges
Functional
•Business rules
automation
Non functional
•Real time
•Historical data volume
Prescriptive analytics, which is part of “advanced analytics,” is based on the concept of optimization, which can be
divided into two areas:
•Optimization: How can we achieve the best outcome?
•Stochastic optimization: How can we achieve the best outcome and address uncertainty in the data to make better
decisions?
- 11. The Data governance challenges
Functional
CDO- CPO
Ethics & Analytics
Regulations
Non functional
•Data Life cycle
•Data Security
•Data quality
Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of
the data employed in an enterprise.
- 12. The Data Architecture challenges
Functional
HTAP*
Data Lake
IoT
Non functional
• Volume
• Cost
• Data Security
• Data quality
• Real time
Data architecture is a set of rules, policies, standards and models that govern and define the type of data
collected and how it is used, stored, managed and integrated within an organization and its database systems.
It provides a formal approach to creating and managing the flow of data and how it is processed across an
organization’s IT systems and applications.
*Hybrid Transactional Analytical Processing
- 13. How the z Systems can help to solve
those challenges?
Analytics- Machine Learning-Data governance-Data architecture
- 14. The descriptive Analytics challenges
Accelerators
IBM DB2 Analytics Accelerator
DB2 BLU
DASHDB
SIMD
SMT
• Data movement – ETL
• INZA-predictive modelling
• Queries
• Open language R-Scala(Spark)
• Archives
• Federation
• DB2 z/OS- IMS-VSAM-Oracle
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
- 15. The Predictive Analytics challenges
Open Framework
Machine Learning
IBM SPSS
Apache Spark
IBM Machine Learning on
z/OS
R
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
- 16. Machine Learning Basics
Identifies patterns in
historical data
Builds/trains
behavioral models
from patterns
Makes
recommendations
Machine learning is everywhere, influencing nearly
everything we do…
Netflix personalized movie
recommendations
Waze personalized
driving experience 7 out of 10 financial customers would take
recommendations from a robot advisor
- 17. Machine Learning - Process
Data
Ingestion
Data Cleaning
and
Transformation
Model
Training
Testing and
Validation
Deployment
Model Selection
From experimentation to production… the real data science challenge
- 18. Machine
Learning can be
applied to a
Variety of Use
Cases
Across Problem
Types and
Industries
Machine learning can help IT department… batch optimization, predictive maintenance/failure,….
be embeded into any expert System.
- 19. The Data governance challenges
Move analytics power &
security to data
Ethics framework into
Analytics project
HW accelerator
Memory extended
zIIP eligibility
Zero cost – Zero latency for IDAA
Apache Spark
Pervasive encryption
MDM
Machine Learning
Privacy by design and by default
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
- 20. The Analytic’s
Ethics dilemma
with personal
data : how GDPR
could slow down
Analytics project
New Analytics or Machine Learning projects will required
Ethical policies by design and by default.
- 22. Recommendations for GDPR readiness with
Analytics and Machine learning projects
• Check if personal data is processed into big data analytics treatment and should consider to
use appropriate techniques to anonymize the personal data in their dataset(s) before
analysis...
• Become transparent about their processing of personal data by using a combination of
innovative approaches in order to provide meaningful privacy notices at appropriate stages
throughout a big data project.
• Embed a privacy impact assessment framework into their big data processing activities to
help identify privacy risks and assess the necessity and proportionality of a given project.
• Adopt a privacy by design approach in the development and application of their big data
analytics. This should include implementing technical and organizational measures to
address matters including data security, data minimization and data segregation...
• Develop ethical principles to help reinforce key data protection principles. Organizations
should create ethics boards to help scrutinize projects and assess complex issues arising
from big data analytics...
• Implement innovative techniques to develop auditable machine learning algorithms.
Internal and external audits should be undertaken with a view to explaining the rationale
behind algorithmic decisions and checking for bias, discrimination and errors...
- 23. The Data Architecture challenges
Federated data lake
Hybrid cloud integration
IDAA
Apache Spark
DashDB
Linux on z
Technology breath : To simply- To alleviate- To secure
Data gravity : volume-sensitivity-cost
HTAP enablement
- 24. Reasons to limit data movement to build a
physical data lake
Data gravity – analytic
treatment move where the
data resides
Data sensitivity – To crypt data
in case of data breach
Real time analytics
requirements
Data governance high
requirements :
•Data quality : reduce data copy
•Data security : regulations ( such as
GDPR)
•Data life cycle management : alleviate
and optimize data management
- 25. The hybrid data lake federated approach
To alleviate data
movement
To use
federated data
approach
To respect
data gravity
To leverage
existing data
set
To limit data
discrepancy
Use z Systems as
one of physical repository
Let z Systems data
In place
Show to your data scientists
How easy it is to access z data
- 26. Imperatives to implement Data Lake hybrid
scenario
Reduce complexity of
information supply chain, e.g.
• Avoid data movement
• Simplify data transformation
• Use in-DB transformation
• Use temporary tables structures
Adhere to innovative and
novel Analytics concepts, e.g.
• Limit number of data marts and data
cubes
• Use aggregation on the fly
• Allow for agile usage patterns
• Leverage HTAP* architecture
- 27. Technologies to use for hybrid data lake
approach
Leverage state-of-the-art technology,
e.g.
HW accelerators
Special-purpose appliances
In-memory processing
Use federation technique whenever
possible, e.g.
Federated SQL queries, leaving data in
place
Federated analytical processing,
leaving data in place
Open Framework (e.g Apache Spark)
*Hybrid Transactional Analytical Processing
- 28. Data in IBM DB2 Analytics Accelerator
• An extension of a DB2 for z/OS system
• ETL process acceleration and alleviation
• Accelerating SQL access to z/OS data, including
IMS, VSAM ... loaded by IDAA Loader
• Managing huge volume of history data (HPSS )
• R queries accelerator
• Apache Spark on z/OS queries accelerator
Transparent and easy data scientists
access
• Thru JDBC or API from Spark on distributed
including Linux on z
With Spark on z/OS as well as Machine
Learning on z/OS
z Systems as a Data Lake Repository into an
hybrid approach- make z Data Simple