SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved. 1
@EvaAndreasson, Dir. Product @Cloudera
Intuitive Real-Time Analytics
with Search
© Cloudera, Inc. All rights reserved. 2
Agenda
• Why Integrated Search Matters
• Trending Real-Time Use Cases
• Key Takeaways
© Cloudera, Inc. All rights reserved. 3
Why Integrated Search Matters
© Cloudera, Inc. All rights reserved. 4
A Multi-Step, Multi-System, Multi-Challenge Process
Files, Images, Video, Logs, Clickstreams External Data SourcesERP, CRM, RDBMS, Machines
EDWs Marts SearchServers Document Stores Storage
© Cloudera, Inc. All rights reserved. 5
Shared Scalable Storage (Data Lake)
EDWs
Marts
Servers
Security&
Governance
Operation
management
Indexing
ETL &
Batch
BI
Analytics
& SQL
Graph &
Machine
Learning
Stream
Processing
3rd Party
Tools
Workload Management
Applications
Enterprise Data Hub Architecture
ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
© Cloudera, Inc. All rights reserved. 6
SolrCloud
Scalable Shared
Storage
Twitter Feed
Flume
Log / IoT
Flume
Files Files
Kafka
Real Time & Batch:
Spark & MapReduce
Real Time &
Batch
HBase
HUE
Enterprise
Custom
Index & Prepare
• Continuous and ad-hoc
• Integrated
• Scalable
• Flexible
• At ingest and at rest
Secure
• Authentication
• Granular Authorization
• Encryption
• Audit & Lineage
Serve
• Custom apps
• Enterprise tools: Zoomdata,
ISS Knowtify
• Simple dash-boarding out of
the box via HUE
© Cloudera, Inc. All rights reserved. 7
•Integrated search is key for real time applications:
 Streamline indexing and data preparation pipelines
 Offload indexing
 Faster result sets
 Find and Do – without switching between systems!
 Flexible architecture
© Cloudera, Inc. All rights reserved. 8
Trending Use Cases
© Cloudera, Inc. All rights reserved. 9
© Cloudera, Inc. All rights reserved. 10
• Use cases
• What characterizes our current targets?
• What are their real time behavior?
• Data challenges
• Larger volumes & new types of data
• Not SQL-savvy audience
• Real time insight needs to impact
feedback loop to targets
• Business value
• Microsegments identified in
real time
• Higher success rate due to immediate
adjustment of program
Real-time Micro Segmentation Service
Mobile
events
Rules Engine
Interactive micro
segment exploration
service and
dashboard
Web
events
Transaction
events
Real Time Event
Processing
Processed event
data
Immediately
searchable
Immediate
action
Adjustment
© Cloudera, Inc. All rights reserved. 11
Ref.: girlscoutcookies.org
© Cloudera, Inc. All rights reserved. 12
• Use cases
• What module needs to be withdrawn?
• How is my data pipe X trending right now?
• Data challenges
• Gain insight over growing event volumes
• Hard to establish naming standards
• Business value
• Pre-empt costly mistakes
• Optimize pipelines and logistics faster
• Quicker time to insight for independent units
Flexible Event Insight Service
Aggregated Pre-generated
Reports
Component or client
data
Unit-defined
error data
Real-time
dashboards and
free-text lookup
Immediately
searchable
© Cloudera, Inc. All rights reserved. 13
Ref.: barcode.ro/tutorials/biometrics/fingerprint.html
© Cloudera, Inc. All rights reserved. 14
• Use case
• Match incoming data against master
data
• Alert on similar data in real time
• Data challenge
• Manual pattern recognition costly
over growing volumes
• New data sets needed more
matching flexibility
• Business value
• Streamlined pattern recognition
Streamlined Match Engine
Data
Master data table
Recognized and
categorized data
Pre- processing
Match
Indexed known data
lookup service
Immediately
searchable
© Cloudera, Inc. All rights reserved. 15
Ref.: stepbystep.com/
© Cloudera, Inc. All rights reserved. 16
• Use case
• Find all relevant emails out of 100M,
where there is a provider and consumer
agreement broken
• Data challenges
• Growing data volumes & variety
• Text analysis requiring lots of manual
processing
• Business Value
• 10-50x process time improvement
• More accurate result sets with
less manual involvement
Fraud-Detection Service
Real-time
ingest
Custom Interactive UI to find relevant
emails
Emails
Vendor
info
Batch ingest
Emails with
multi-type
attachments
Pattern-basedscoringprocess
Immediately
searchable
© Cloudera, Inc. All rights reserved. 17
My natural language query….
Files, Images, Video,
Logs, Clickstreams External Data SourcesERP, CRM, RDBMS, Machines
© Cloudera, Inc. All rights reserved. 18
• Use case
• Filter structured data over text element
• Match word in 10 yrs of manually entered data
• Data challenges
• Complex queries to use text matching via SQL
• Multi-spelling of items, misspelling in queries
• Accessibility of historical data silos
• Business value
• 10x reduced # lookups per item
• 8x improved lookup time
• 30% less impact on high-end systems
Faster Access to All Data
Tables
Giant 10 year
transaction table
Interactive, facetted
text-based search
service over archive
Item registry from
multiple sources
Spelling-tolerant item
lookup service
Tables
Immediately
searchable
Immediately
searchable
© Cloudera, Inc. All rights reserved. 19
Key Takeaways
© Cloudera, Inc. All rights reserved. 20
Key Takeaways
Business Value Search Provides Integrated Search Adds
Gain deeper insight by
exploring structured and
unstructured data together
Interactive, multi-type data
correlation
Multi-type, scalable data
storage – one truth
Serve more users while
accessing data securely
Multi-tenant discovery
service and active archive
Secure and granular access
control – same model across
workloads
Make more accurate
decisions over growing and
dynamic data sets
Relevance and accuracy Flexible, scalable, and
streamlined data pipelines
© Cloudera, Inc. All rights reserved. 21
Key Takeaways
Business Value Search Provides Integrated Search Adds
Find and learn from patterns
faster
Free-text, faceted real time
interactive segmentation
Automatic feedback loops
Eliminate time-consuming
manual tasks and costly
mistakes
Multi-language text and full
document matching
Scalable and automatic
matching on growing real
time data ingest volumes
Build more interesting and
valuable business applications
Rich query language: Fuzzy
matching, term-distance,
shape queries
Search AND analytics in the
same platform
© Cloudera, Inc. All rights reserved. 22
Thank you!
@EvaAndreasson, @Cloudera

More Related Content

Intuitive Real-Time Analytics with Search

  • 1. © Cloudera, Inc. All rights reserved. 1 @EvaAndreasson, Dir. Product @Cloudera Intuitive Real-Time Analytics with Search
  • 2. © Cloudera, Inc. All rights reserved. 2 Agenda • Why Integrated Search Matters • Trending Real-Time Use Cases • Key Takeaways
  • 3. © Cloudera, Inc. All rights reserved. 3 Why Integrated Search Matters
  • 4. © Cloudera, Inc. All rights reserved. 4 A Multi-Step, Multi-System, Multi-Challenge Process Files, Images, Video, Logs, Clickstreams External Data SourcesERP, CRM, RDBMS, Machines EDWs Marts SearchServers Document Stores Storage
  • 5. © Cloudera, Inc. All rights reserved. 5 Shared Scalable Storage (Data Lake) EDWs Marts Servers Security& Governance Operation management Indexing ETL & Batch BI Analytics & SQL Graph & Machine Learning Stream Processing 3rd Party Tools Workload Management Applications Enterprise Data Hub Architecture ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
  • 6. © Cloudera, Inc. All rights reserved. 6 SolrCloud Scalable Shared Storage Twitter Feed Flume Log / IoT Flume Files Files Kafka Real Time & Batch: Spark & MapReduce Real Time & Batch HBase HUE Enterprise Custom Index & Prepare • Continuous and ad-hoc • Integrated • Scalable • Flexible • At ingest and at rest Secure • Authentication • Granular Authorization • Encryption • Audit & Lineage Serve • Custom apps • Enterprise tools: Zoomdata, ISS Knowtify • Simple dash-boarding out of the box via HUE
  • 7. © Cloudera, Inc. All rights reserved. 7 •Integrated search is key for real time applications:  Streamline indexing and data preparation pipelines  Offload indexing  Faster result sets  Find and Do – without switching between systems!  Flexible architecture
  • 8. © Cloudera, Inc. All rights reserved. 8 Trending Use Cases
  • 9. © Cloudera, Inc. All rights reserved. 9
  • 10. © Cloudera, Inc. All rights reserved. 10 • Use cases • What characterizes our current targets? • What are their real time behavior? • Data challenges • Larger volumes & new types of data • Not SQL-savvy audience • Real time insight needs to impact feedback loop to targets • Business value • Microsegments identified in real time • Higher success rate due to immediate adjustment of program Real-time Micro Segmentation Service Mobile events Rules Engine Interactive micro segment exploration service and dashboard Web events Transaction events Real Time Event Processing Processed event data Immediately searchable Immediate action Adjustment
  • 11. © Cloudera, Inc. All rights reserved. 11 Ref.: girlscoutcookies.org
  • 12. © Cloudera, Inc. All rights reserved. 12 • Use cases • What module needs to be withdrawn? • How is my data pipe X trending right now? • Data challenges • Gain insight over growing event volumes • Hard to establish naming standards • Business value • Pre-empt costly mistakes • Optimize pipelines and logistics faster • Quicker time to insight for independent units Flexible Event Insight Service Aggregated Pre-generated Reports Component or client data Unit-defined error data Real-time dashboards and free-text lookup Immediately searchable
  • 13. © Cloudera, Inc. All rights reserved. 13 Ref.: barcode.ro/tutorials/biometrics/fingerprint.html
  • 14. © Cloudera, Inc. All rights reserved. 14 • Use case • Match incoming data against master data • Alert on similar data in real time • Data challenge • Manual pattern recognition costly over growing volumes • New data sets needed more matching flexibility • Business value • Streamlined pattern recognition Streamlined Match Engine Data Master data table Recognized and categorized data Pre- processing Match Indexed known data lookup service Immediately searchable
  • 15. © Cloudera, Inc. All rights reserved. 15 Ref.: stepbystep.com/
  • 16. © Cloudera, Inc. All rights reserved. 16 • Use case • Find all relevant emails out of 100M, where there is a provider and consumer agreement broken • Data challenges • Growing data volumes & variety • Text analysis requiring lots of manual processing • Business Value • 10-50x process time improvement • More accurate result sets with less manual involvement Fraud-Detection Service Real-time ingest Custom Interactive UI to find relevant emails Emails Vendor info Batch ingest Emails with multi-type attachments Pattern-basedscoringprocess Immediately searchable
  • 17. © Cloudera, Inc. All rights reserved. 17 My natural language query…. Files, Images, Video, Logs, Clickstreams External Data SourcesERP, CRM, RDBMS, Machines
  • 18. © Cloudera, Inc. All rights reserved. 18 • Use case • Filter structured data over text element • Match word in 10 yrs of manually entered data • Data challenges • Complex queries to use text matching via SQL • Multi-spelling of items, misspelling in queries • Accessibility of historical data silos • Business value • 10x reduced # lookups per item • 8x improved lookup time • 30% less impact on high-end systems Faster Access to All Data Tables Giant 10 year transaction table Interactive, facetted text-based search service over archive Item registry from multiple sources Spelling-tolerant item lookup service Tables Immediately searchable Immediately searchable
  • 19. © Cloudera, Inc. All rights reserved. 19 Key Takeaways
  • 20. © Cloudera, Inc. All rights reserved. 20 Key Takeaways Business Value Search Provides Integrated Search Adds Gain deeper insight by exploring structured and unstructured data together Interactive, multi-type data correlation Multi-type, scalable data storage – one truth Serve more users while accessing data securely Multi-tenant discovery service and active archive Secure and granular access control – same model across workloads Make more accurate decisions over growing and dynamic data sets Relevance and accuracy Flexible, scalable, and streamlined data pipelines
  • 21. © Cloudera, Inc. All rights reserved. 21 Key Takeaways Business Value Search Provides Integrated Search Adds Find and learn from patterns faster Free-text, faceted real time interactive segmentation Automatic feedback loops Eliminate time-consuming manual tasks and costly mistakes Multi-language text and full document matching Scalable and automatic matching on growing real time data ingest volumes Build more interesting and valuable business applications Rich query language: Fuzzy matching, term-distance, shape queries Search AND analytics in the same platform
  • 22. © Cloudera, Inc. All rights reserved. 22 Thank you! @EvaAndreasson, @Cloudera