Creating your Center of Excellence (CoE) for data driven use cases
- 1. 1© Cloudera, Inc. All rights reserved.
Creating your center of excellence
Becomingdata-driventhroughculturalchange
Frank Vullers
Business Value Strategist Cloudera
- 3. 3© Cloudera, Inc. All rights reserved.
Imagine a world where we…
use sensors to understand
air quality triggers to
infant asthmatic events.
- 4. 4© Cloudera, Inc. All rights reserved.
Imagine a world where we…
track weather and crowds to
reduce environmental impact
while improving service.
- 5. 5© Cloudera, Inc. All rights reserved.
Imagine a world where we…
5
use social media data to fight
child sexual exploitation.
- 6. 6© Cloudera, Inc. All rights reserved.
Imagine a world where we…
use data for early detection
to save lives.
- 7. 7© Cloudera, Inc. All rights reserved.
Imagine a world where we…
use data to simulate human
travel to deep space.
- 8. 8© Cloudera, Inc. All rights reserved.
We live in that world today because our relationship with
data is changing
- 9. 9© Cloudera, Inc. All rights reserved.
Instrumentation
Today, everything that can be
measured will be measured.
Today, data is the
application.
Today, becoming
data-driven is a
imperative..
Consumerization
Experimentation
Data is now a strategic asset
- 10. 10© Cloudera, Inc. All rights reserved.
50%
50%
By 2017,
By 2018,
or fewer organizations will have made
the cultural or business model
adjustments to benefit from big data.
of business ethics violations will be
from improper use of big data
analytics.
Gartner “Predicts 2015: Big Data Challenges Move From Technology to the Organization” – November 2014
Yet the journey requires organizational change
- 11. 11© Cloudera, Inc. All rights reserved.
How do you ensure success?
Our most successful customers
do these five things.
- 12. 12© Cloudera, Inc. All rights reserved.
1. Build a data-driven culture
2. Develop the right team and skills
3. Adopt an agile/lean approach
4. Efficiently operationalize your insights
5. Right-size data governance
Our most successful customers do these five things
Data-driven
culture
Team and skills Agile
Operationalize
insights
Data
governance
- 14. 14© Cloudera, Inc. All rights reserved.
key to success for the overall data-driven mission including
advocacy for creating/collecting data and for individual use cases.
§ Focused on change, and willing to take risk
§ Use every opportunity to brief sponsors and stakeholders.
Profile
Education
Advocacy § Build Big Data success stories from within the business.
The important role of the executive sponsor
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Description
- 15. 15© Cloudera, Inc. All rights reserved.
• Make communications more programmatic
• Enable many in the organization to become evangelists
Insights § The value from the data & use case delivered to the business.
Data
§ Valuing data as (eventually) a balance sheet asset
§ Size and utilization of the asset, specific data sources ingested
§ Governance & maturity
Platform &
tooling
§ Updates on releases and capabilities in the platform / ecosystem of user tools
Communications content
Vision § How being data-driven will deliver business results. Align to strategic initiatives.
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Description
- 16. 16© Cloudera, Inc. All rights reserved.
Description Use different vehicles and forms to enable collaboration
Meetups
§ Bringing together the larger community to share interests, learnings, and wins.
§ Team led
Big Data days
§ Transfer of information through executive led thought leadership.
§ Include experts from across the business units, vendors, partners.
§ Cross-domain focussed.
Hackathons § Allow developers to build new applications designed to boost business.
Communications and collaboration
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 17. 17© Cloudera, Inc. All rights reserved.
Visualizations § Powerful way to express the importance of insights on the road to action.
Movie trailer § Compelling visualizations can serve as a ‘trailer’ for your movie
How and when § Make them colourful and make them move
Description
The hardest part of any analytic project , is enabling action.
Visualizations are a powerful tool to help
The power of visualizations
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Telco roaming
crime ring –
Argyle data
Point of Interest
(POI) Density by
City
Tweets by GPS
coordinate near M&S
shopping area
- 18. 18© Cloudera, Inc. All rights reserved.
Visualizations about the data asset itself help your user
community understand the size, growth and value of your data
Scorecard
category
KPI Q1
Target
Q1
Actuals
Reach # data sources under management 15 25
# business KPIs reported 10 20
Amount of data under management .5PB 1PB
Acquisition # of platform users 200 400
# of jobs per day 2000 5000
Conversion # of jobs moved from other clusters 50 30
Churn Churn of jobs and data 0 0
Capacity
/Utilization
Amount of storage -
Amount of data under management
75% 75%
0
50
100
150
200
250
Description
The power of visualizations
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 20. 20© Cloudera, Inc. All rights reserved.
A traditional BI and analytics organization consists of three
main components.
Analytics § Use data to develop reports, find insights
Data
management
§ Satisfy requests, answer users questions, load models
Infrastructure § Hardware and software specialists and software components
Description
Staff for success
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
The past & the present
- 21. 21© Cloudera, Inc. All rights reserved.
Analytics
Data
management
Infrastructure
Data Engineering team becomes strategic , data can be
transformed and used many different ways
Big Data
management
Architects
Data scientists
Data engineers
It is critical
that these
three roles
be tightly
aligned
Description
Staff for success
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
The future
- 22. 22© Cloudera, Inc. All rights reserved.
Ecosystems change rapidly
Architects need to balance tactical and strategic needs.
Communication § Collaborate between software and hardware infrastructure
Education § Training is essential: admin, developer.
Leadership § Be the infrastructure expert and advise on new projects/requirements
Description
Your infrastructure team & architect
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Infrastructure
architecture
& operations
Enterprise
architect
Hadoop
admin
Network
admin
Systems
admin
- 23. 23© Cloudera, Inc. All rights reserved.
Employ it in a meaningful way.
be committed to make data the utmost strategic asset
§ advocate for new data and for improved data.
§ Get trained and certified
§ Promote and evangelize value and data governance
Your data engineering team
Description
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Data
engineering
Data
engineer
Data
stewards
Data ingest
(ETL)
Data dev
operations
Information
security
Communication
Education
Leadership
- 24. 24© Cloudera, Inc. All rights reserved.
The hybrid data scientist
• Subject Matter Expertise lies in the business
• Hacking skills ,existing IT staff or new hires
• Staff at least one true Ph.D statistician for
model oversight across all teams
Important character trait
A luxury is finding one or more data
scientists that cross these disciplines
Your data scientist team(s)
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Curiosity
Math &
Statistical
Knowledge
Hacking
skills
Subject
Matter
Expertise
Data Science
- 25. 25© Cloudera, Inc. All rights reserved.
(centralized) Data Science team partner with the
business to identify data, explore use cases to solve
Agility § The team must be able to learn quickly and adapt
Skills
§ Computer science domain expertise and at least one true statistician.
Teams § domain expertise in-house, add in MS/Ph.D. and hire that one true statistician
Experts
§ This team must be the “data experts” for the entire company
Staff for success: data science-as-a-service
Description
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Analytical
development
Data
scientist
Application
developer
SQL
developer
- 26. 26© Cloudera, Inc. All rights reserved.
Organizing and sizing for success
Centralized De-centralized
Data science-
as-a-service
Data engineers
Architecture
Scale based upon # of
use cases
Scale based upon # of
data sets and amount of
data under management
Scale based on cluster
size and # of
components used
Business
focused SQL
and app
developers,
analysts and
data scientists
Scale based upon # of
use cases
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 28. 28© Cloudera, Inc. All rights reserved.
Lower risk § Risk of funding long-running projects with limited business value is small.
Lower costs § Can run infrastructure, data and insights work streams in parallel.
Communication § clear short-term results, continuous communication stream (results / failures)
Team § Can start with small team, and add additional scrum teams
Provides actionable results more rapidly and measures the
value gained at each step, in small iterations.
Leverage agile methodology to reduce risk (1/3)
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Description
- 29. 29© Cloudera, Inc. All rights reserved.
Transparency and ‘fail-fast’ capability are essential.
A lightweight agile process using 2-3 week sprints.
Epics § Documentation for broad concepts and requirements (ingestion/use case)
Stories § Business description of the work to be done (within a sprint)
Tasks § Manageable units of work with success criteria/ clear requirements
Teams
§ Small co-located (virtual is possible) teams deliver quickly and sprint exits
offer opportunity for demos and transparency into work
Description
Leverage agile methodology to reduce risk (2/3)
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 30. 30© Cloudera, Inc. All rights reserved.
Product Owner
§ Identify the person that owns the lean development and on-going agile
management of each workstream
Backlog
§ Use agile and lean methods to document the work that needs to get done
Move
§ Agile means don’t wait until you know everything. Get moving quickly and be
able to change as new information becomes available
Roadblocks
§ The purpose of the scrum master and product owners are to remove
roadblocks so the team can continue to move and make progress
Agile applies to parallel workstreams: data asset creation and
management, and insights / data science / analysis.
Leverage agile methodology to reduce risk (3/3)
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Description
- 31. 31© Cloudera, Inc. All rights reserved.
list of Big Data needs will be longer than the list of resources
A transparent process for prioritizing the work is essential
Quarterly
§ Review of data processing and use case epics. Prioritize backlog.
Data epics § Prioritize : Value of data across business, conformed dimensions, single use case
Use case epics § Prioritize : Value in the business, Availability of the data, Ability to dimensions
Data and use case work prioritization process
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Description
- 32. 32© Cloudera, Inc. All rights reserved.
Agile methodology enables iterative workstreams
Use Case Development
EDH Buildout
Data Governance & Common Profile
Development
Data Engineering
Use Case Development
Agile Use Case Development
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
Agile Data Ingestion/Management
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
Agile Data Governance & Common Profile Development
Scrum Team Release 1 Release 2 Release 3
Production
Ready
Scrum Team Release 1 Release 2
Production
Ready
Release 4
Release 3
EDH Buildout
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 33. 33© Cloudera, Inc. All rights reserved.
After prioritization communicate along with the business the
work roadmap describing the areas
Business
roadmap
§ Data sets processed, infrastructure installed / use cases/insights expected
Technical
roadmap
§ Use the agile epics, stories, etc backlog to manage the technical deliverables.
Your Big Data business roadmap
Iterative § Ability to “fail-fast”. roadmaps change more often then in a waterfall world.
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Description
- 34. 34© Cloudera, Inc. All rights reserved.
Prioritization
Use cases
Availability Complexity Value
• accessible?
• quality?
• steward?
• generate /buy the data ?
• Shareable ?
• Simple report
• Machine Learning
• Advanced Analytics
• Automated decision making
• align to objective ?
• enable other case(s) ?
• increase revenue/ cost savings ?
• insights effect ?
• Executive override
Data*
High Medium Low
• urgently needed ?
• high value ?
• Multiple teams ?
• short-lived or streaming ?
• Augment existing data
• Reuse existing data processing
code.
• Easy to pull down.
• API allows to bring historical data.
• some data access/workaround.
• Low-quality data.
• Data has to be screen-scraped.
• Low likelihood of data being used
*Source: Carl Anderson – “Creating a Data-Driven Culture”
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 36. 36© Cloudera, Inc. All rights reserved.
Cloudera customers: strategic initiatives
Drive customer insights
Connect products
& services
Protect business
There are three axes to support these initiatives:
1. Data available
2. Analytical methods
3. Integration of analytical results into Report/App/ Web app etc
1 32
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 37. 37© Cloudera, Inc. All rights reserved.
Digital/mobile Transaction CRM/call center Demographics Network/product Social
The first axis of analytics: the data
Properties: batch, stream, real-time
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Digital media
• Teradata Aprimo
• IBM Unica
• Oracle Eloqua
• X+1
Web logs
• Microsoft IIS
• Apache
• nginx
• Google GWS
Clickstream/UX
• Adobe Omniture
• IBM Coremetrics
• IBM Tealeaf
• Google Analytics
Premium
• Webtrends
Mobile application
• SMS
Retail
Mobile
Web
Channel
Distributor
Bot
Call center
Indirect
Kiosk
Embedded commerce
service
Billing
Customer lifecycle
• Acquisition
• Churn
• Cross-Sell
• Upsell
CRM
• MS Dynamics
• Oracle/Siebel
• Salesforce
• SAP
Online chat
• Oracle RightNow
• Moxie Live Chat
• LivePerson
• Instant Service
• Oracle Live Help
• BoldChat
• Zendesk Zopim
• Kana Live Chat
IVR
• Avaya
• Cisco
• Nortel
• Nuance
Data broker / syndicate
• Acxiom
• CoreLogic
• Datalogix
• eBureau
• ID Analytics
• Intelius
• PeekYou
• Rapleaf
• Recorded Future
• IHS Polk
• Nielsen
• InfoScout
• Symphony IRI
• Gfk
Behavior
Loyalty
• Aimia
• Brierley+Partners
• Comarch
• Epsilon
• Kobie
• ICF Olson 1to1
• Merkle
• Clutch
• CrowdTwist
• DataCandy
• Deluxe
• Inte Q
• ICLP
Survey
• ABA
• Medallia
• Forsee
• Allegiance
• Walker Information
Direct
• Twitter
• Facebook
• Bazaarvoice
Listening/management
• Sprinklr
• Crimson Hexagon
• Radian6
• Lithium
• Simply Measured
• Curalate
• Datasift
Voice of the community
• CSAT
• NPS
- 38. 38© Cloudera, Inc. All rights reserved.
The 2nd axis of analytics: analytical processing
Unsupervised learning: clustering,
topic modeling, time series analysis
Classification: gradient boosted trees,
SVMs, logistic regression, etc
Deep learning ("neural nets") and
natural language processingProfile
Customer
→ Detect anomalous events (e.g.;
predictive maintenance)
→ Score entities by behavior (e.g.;
churn analytics)
→ Classify or cluster
unstructured data (e.g.;
images or text for cyber
threats)
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Simple Aggregation
- 39. 39© Cloudera, Inc. All rights reserved.
The 3rd axis of analytics: serving insights
Integration with web
applications via Spark or
HBase
Integration with mobile
apps via Spark or HBase
Integration with enterprise
applications, e.g.; CRM,
sales
Search applications via solr
Serving to standard BI tools
(e.g.; Tableau, Qlik)
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 40. 40© Cloudera, Inc. All rights reserved.
Project execution methodologies-the change*
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
*https://blog.spec-india.com/from-waterfall-to-agile-to-devops-a-cultural-and-technological-shift/
Design Code Test Deploy
Design Code Test Code Test Code Test Code Test Deploy
Design C T D C T D
Waterfall
Agile
DevOps
80-90’s
Late 90’s
00’s C T D C T D C T D C T D C T D C T D
- 41. 41© Cloudera, Inc. All rights reserved.
Insights
Big Data
management
Infrastructure
links highly fluid and continuous development with more
structured infrastructure by ensuring the collaboration
Description
§ Existing IT support handles Infrastructure for L1, L2,
L3.
§ Existing IT support passes data and insights
through L1 and L2 to a DevOps team for Level 3
support. Key contact with Cloudera support
§ Data and insights teams work with DevOps for
production, and DevOps works with L2 to debug and
deploy.
**Cloudera
administrator,
developer,
navigator,
security training
Key role: DevOps for continuous deployment
Data-driven culture Team and Skills Agile
Operationalize
insights
Data governance
- 43. 43© Cloudera, Inc. All rights reserved.
Governance: the foundation of data management
Compliance
Track, understand and
protect access to data
Am I prepared for an
audit?
Stewardship
Manage and organize
data assets at Hadoop
scale
Data Science
Effortlessly find and trust
the data that matters
most
Administration
Boost user productivity
and cluster performance
Who’s accessing what
data?
What are they doing
with the data?
Is sensitive data
governed and
protected?
How to efficiently
manage data lifecycle,
from ingest to purge?
How do I classify data
efficiently?
How do I make data
available to my end
users efficiently?
How can I explore data
on my own?
Can I trust what I find?
How do I use what I
find?
How do I find and use
related data sets?
How is data being
used today?
How can I optimize for
future workloads?
How can I quickly take
advantage of Hadoop
risk-free?
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 44. 44© Cloudera, Inc. All rights reserved.
Data stewards
Owners and/or
creators of the data
Responsibilities
§ Knowledge of the data
§ Documenting
Data engineers
Implement the data
governance policies
Responsibilities
§ Defining the governance
§ Organizing the Council
§ Utilize tools
Data governance
council
Business owners of
the data governance
Responsibilities
§ Communication governance
§ Assigning data steward roles
§ Improving the link-ability
Right-size your Big Data governance
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 45. 45© Cloudera, Inc. All rights reserved.
Data governance council
Cross-Org: has the authority for governing data
Data stewards
Retention
management
Profile
management
Quality
management
Data
privacy
Step 1: Data engineering team proposes policies, data steward roles, data dictionary,
master data (for profiles)
Step 2: Employ technology, e.g. Navigator, to implement the policies
Step 3: Data governance exec council – cross company participation
Data governance program
Responsible for governing the company’s Big Data asset
Right-size your Big Data governance
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
- 46. 46© Cloudera, Inc. All rights reserved.
1. Build a data-driven culture
2. Develop the right team and skills
3. Adopt an agile/lean approach
4. Efficiently operationalize your insights
5. Right-size data governance
Data-driven culture Team and skills Agile
Operationalize
insights
Data governance
Our most successful customers do these five things
- 47. 47© Cloudera, Inc. All rights reserved.
Thank you
Frank Vullers
Business Value Strategist Cloudera
fvullers@cloudera.com
@FrankVullers
- 50. 50© Cloudera, Inc. All rights reserved.
Integrity – Complete Audit and Policy rulesGovern
• Encrypt sensitive data
• Personal data is moved to encrypted zone
• Policy rule in Navigator
• Audit on all data
• All action fully audited
• Denied access
• Access to personal data all audited
- 51. 51© Cloudera, Inc. All rights reserved.
Right to be Forgotten – Lineage back to source
Cloudera Navigator Lineage
Operate
• Lineage back to Source
• GDPR data tagged
• Source System tagged
• Source Table and Column
• Delete back to Source
• Trigger overnight Delete process