Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Information Fabric
- 1. Anzo Smart Data Lake® 4.0
A Data Lake Platform for the Enterprise Information Fabric
Ben Szekely – Vice President of Solution Engineering
Austin Meyer- Solutions Engineer, Pre Sales
- 2. “The complex process of bringing together
different data sources throughout an
organization is now being automated
creating a single, semantic layer of an
organization’s data.“
WIRED News
The Semantic Layer
- 3. “Semantic approaches are the future of
the enterprise information fabric“
Michele Goetz - Principal Analyst - Forrester Research
The Information Fabric
- 4. “Semantic approaches are the future of
the enterprise information fabric“
Michele Goetz - Principal Analyst - Forrester Research
The Information Fabric
- 5. Anzo Smart Data Lake® 4.0
The industry leading platform for building a Semantic Layer
Open StandardsEnd-To-End Enterprise Scale
- 8. The Drive for Insight
Ask more questions - faster
Stay ahead of
the competition
Uncover revenue
growth opportunities
What is the market
landscape for lung cancer
therapies in 2020?
How do I design my
clinical trial to benefit
the most patients?
- 10. The data driven business must
minimize costs to deliver on
business commitments
Shrink Execution Costs
Which is increasingly difficult to
achieve in complex data and
regulatory environments
In Complex Environments
The Urgency of Requirements
“I need to deliver on-
time Adverse Event
reports to the FDA”
“The complexity of
clinical data standards
inhibits the time it takes
to design trials”
Cost effective flexibility
- 11. Executing in dynamic and complex data
environments is costly
requiring brittle one-off solutions and
manual efforts to combine data
REQUIREMENTS
Enterprise Data
- 12. Executing in dynamic and complex data
environments is costly
that simply can't keep up…
REQUIREMENTS
Enterprise Data
- 18. ©2017 Cambridge Semantics Inc. All rights reserved.
First Gen
The Semantic Layer requires more than data
alone.
Where Data Lakes Fall Short
High value Data Lakes must tie information together, in
the language of the business.
“Last Mile”
Analytics
Enterprise
Data Sources
Cloud
Storage
...which requires custom coding
and tool integration
Data Ingestion
Structured Data
Ingest and ETL
Storage
Infrastructure
Data Cataloging
Basic Metadata
Cataloging
Self Service
Access to Raw
Data Sets (SQL)
Second Gen
- 20. DATA WAREHOUSES 1ST GEN DATA LAKE (HADOOP)
Semantic Layer
Connects data with
business meaning
Data On Demand
Gives the business
users access to data
Enterprise
Knowledge
Graph
ANZO SMART DATA LAKE®
- 25. ©2017 Cambridge Semantics Inc. All rights reserved.
Product
Name
Product ID Opportunity
Product ID
Product
Name 1
Product ID 1 Product ID 1
Product
Name 2
Product ID 2 Product ID 2
… … …
Product
Automated ETL Generation and Execution
Opportunity
Product ID
Account ID Geo
Product ID 1 Acc ID 1 Americas
… … …
Marketing Bookings
Product ID 1
Product ID
Product ID Account ID Geo
Product ID 1 Acc ID 1 Americas
… … …
Revenue
Revenue
Product ID 1
Product ID
Acc ID 1
Account ID
Product
Product ID 1
Product ID
Product ID 1
Product ID
Product Name 1
Product
Name
Marketing
Product ID 1
Product ID
Americas
Geo
Acc ID 1
Account ID
- 26. Graphmarts and Data Layers
Loads knowledge graph into
memory for layered prep and
analytics
Data Catalog
Manages and secures
the semantic layer
- 27. Anzo
Graphmart
Go Live
Base Layer
Graph Data Loaded
from Catalog
Data Prep Layer
Rule Layer
Access Control
Layer
Cleanse
Layer
Relationship Layer Rule Layer
Anzo Graphmarts and
Data Layers
Create new relationships
Transformation and conformance
Transform onto canonical models
Define granular access control
Deploy in the cloud on-demand
Data on demand services
- 28. ©2017 Cambridge Semantics Inc. All rights reserved.
Product
Name
Product ID Opportunity
Product ID
Product
Name 1
Product ID 1 Oppty Product
ID 1
Product
Name 2
Product ID 2 Oppty Product
ID 2
… … …
Product
Data Layers – Create Dynamic Relationships
RelationshipLayer
Opportunity
Product ID
Account ID Geo
Oppty
Product ID 1
Acc ID 1 Americas
… … …
Marketing Bookings
Product ID 1
Product ID
Product
Product ID Account ID Geo
Oppty
Product ID 1
Acc ID 1 Americas
… … …
Revenue
Product
Revenue
Product ID 1
Product ID
Acc ID 1
Account ID
Product
Product ID 1
Product ID
Opportunity
Product ID 1
Opportunity
Product ID
Product Name 1
Product
Name
Marketing
Opportunity
Product ID 1
Opportunity
Product ID
Americas
Geo
Acc ID 1
Account ID
Product ID 1
Product ID 1
Acc ID 1
Acc ID 1
Opportunity
Product ID 1
Opportunity
Product ID 1
Product ID 1
- 29. ©2017 Cambridge Semantics Inc. All rights reserved.
6/30/2016
Holding
VersionDate
1006003
SecurityCode
1-3WGC-0
AccountCode
6/30/20166/30/2016
AccountCode SecurityCode VersionDate
1-3WGC-0 1006003 6/30/2016
1-3WGC-2 1013967 7/31/2017
… … …
Holdings
Data Layers – Create Dynamic Relationships
AccountCode VersionDate AccountName
1-3WGC-0 6/30/2016 BLDRS Asia 50 ADR Index
Fund
1-3WGC-2 7/31/2017 BLDRS Emerging Markets
50 ADR Index Fund
… …
Account Reference
SecurityCode VersionDate Ticker
1006003 6/30/2016 RYAAY US
1013967 7/31/2017 AAAP US
… …
Security Reference
BLDRS Asia 50
ADR Index Fund
AccountName
1-3WGC-0
AccountCode
Account
6/30/2016
VersionDate
Security
RYAAY US
Ticker
1006003
SecurityCode
6/30/2016
VersionDate
1006003
1006003
6/30/2016
1-3WGC-0
1-3WGC-0
6/30/2016
RelationshipLayer
- 30. ©2017 Cambridge Semantics Inc. All rights reserved.
CSI_AZ_001
DEMOGRAPHICS
TRIALID
9
LOCATIONID
21
PATIENTID
PATIENTID TRIALID LOCATIONID
21 CSI_AZ_001 9
41 CSI_AZ_001 1
… … …
DEMOGRAPHICS
Data Layers for Data Prep Layer
Subjects
DataPrepLayer
PATIENTID PREFERRED TOXICITY
21 Abscess – Abdominal 4
41 HA 2
… … …
SIDEEFFECTRECORD
SUBJECTID STUDYID SITEID
2001 CSI_AZ_002 8
15024 CSI_AZ_006 9
… … …
Subjects
SUBJECTID PREFERREDTERM TOXICITYGRADE
2001 Abdominal abscess 4
15024 Periorbital oedema 5
… … …
AdverseEvent
SIDEEFFECTRECORD
AdverseEvent
21
PATIENTID
PREFERRED
Abscess -
Abdominal
4
TOXICITY
CSI_AZ_002
STUDYID
8
SITEID
2001
SUBJECTID
Abdominal
abscess
PREFERRED
TERM 4
TOXICITY
GRADE
2001
SUBJECTID
Abdominal
abscess
RAWCONFORMED
TOXICITY
GRADE
PREFERRED
TERM
SUBJECTID
SITEID
STUDYID
SUBJECTID
- 31. Anzo Graph Query Engine
In-memory MPP architecture
Managed through Graphmarts and Data Layers
Horizontal scale with on-demand cloud computing
Interactive data preparation and analytics
Scales to trillions of facts
- 32. Self-guided exploratory analytics
Secure and governed
Model driven configuration, captured in Data Catalog
Data Layers and extracts
Discovery for unstructured and structured data
No query writing – automated query generation
Hi-Res Analytics
Hi-Res Analytics
- 34. ©2017 Cambridge Semantics Inc. All rights reserved.
• Learn about navigating the business friendly graph
model
• Learn how to clean and prepare data with basic
formulas
• Learn how to assemble data sets to answer questions
in analytic tools
Claim
Patient
Record
Drug
Note
Subscriber
A Citizen Data Scientist comes up to speed in hours in
collaborative workshops
Hi-Res Analytics for the Citizen Data Scientist
- 37. ©2017 Cambridge Semantics Inc. All rights reserved.
Claim ID Process Date Subscriber ID
44223 10/3/2015 C12345
44224 10/7/2015 C23412
… … …
Claims
On July 3, 2016 Patient BA213 seemed
frustrated after experiencing headache and
nausea following 500mg dosage of sleep aid
therapeutic, Narcoleptol.
On Site Doctor Note
Building and Expanding the Enterprise Knowledge Graph
Patient ID Condition Drug Name
BA213 Sleep Apnea Narcoleptol
CS289 Type II Diabetes Insulin
… …
EHR
BA213
Patient ID
Drugprescribed
Narcoleptol
brand
name
Sleep
Apnea
Condition
10/3/2015
Process
Date Subscriber
C12345
Subs. ID
Patient
Record
about
500mg
Dosage
Note
3/7/2016
when
Headache
and nausea
event
-.05
Sentiment
score
Claim
44223
Claim ID
about
- 41. ©2017 Cambridge Semantics Inc. All rights reserved.
AnalyticsEndpoint
Product ID Geo Account ID Opportunity
Prod ID
Product ID 1 Americas Acc ID 1 Opportunity
Prod ID
… …
My Product Set
Analytics Endpoints for On-Demand Access in any Tool
Bookings
Product ID 1
Product ID
Product
Product
Revenue
Product ID 1
Product ID
Acc ID 1
Account ID
Product
Product ID 1
Product ID
Opportunity
Product ID 1
Opportunity
Product ID
Product Name 1
Product
Name
Marketing
Opportunity
Product ID 1
Opportunity
Product ID
Americas
Geo
Acc ID 1
Account ID
- 42. Enterprise Knowledge Graph
Enabling on-demand
access to data
by those seeking
answers and insight
Scalability
Security
Governance
Lineage
Automated
Structured
Data Ingestion
Natural Language
Processing and
Text Analytics
Rich
models
Hi-Res Analytics
Anzo Smart Data Lake®
Data On Demand
- 43. Enterprise Knowledge Graph Data On Demand
Automated Ingestion of
Patient Data
Patient
Safety
Clinical
Trial Ops
R & D
Health
Economics
The Smart Data Lake for Digital Patient Health
Insight for Decision
Makers
Improving patient
outcomes, safety, and
comfort
Reducing the time bring
medicines to patients
Lowering the cost of
healthcare
Insurance Claims
Clinical Trials
Rx Data
Health Records
Genetic Data
Wearables
+
- 44. Enterprise Knowledge Graph Data On Demand
Automated Ingestion of
Customer Data
Sales
Complianc
e
Marketing
Risk
Management
The Smart Data Lake for Customer 360
Insight for Decision
Makers
Connect with your
customers
Reduce risk
Increase revenue
Account Data
Trading Data
Marketing Data
Relationship
Data
- 45. ©2017 Cambridge Semantics Inc. All rights reserved.
Node 1 Node 2
GQE Cluster
Node N
…
Node 1 Node 2
Hadoop/Spark/HDFS Cluster
Node M
…
…
Anzo Enterprise Server
Node 1 Node 2 Node P
…
ASDL Server
Anzo Ingest Servers
Node 1 Node 2 Node P
…
Client
Browser
Active Directory
Anzo on the Web App
ASDL Web App
HTTP/ODATA/SPARQL
Structured,
Graph Data
SPARQL
HTTP/GRPC
HTTP/HTTPS
HTTP/JMS Metadata
Synchronization
HTTP/JMS
Metadata
Synchronization
HDFS
Fuse
Apache Livy
Metadata
HTTP/HTTPS
Elastic Search Cluster
Node 1 Node 2 Node N
…
DS1 DS2 DS3…
…
JDBC
…
Schema
Job Execution
HTTP/HTTPS
Ustructured
Data
Documents
Anzo Smart Data Lake Architecture
- 46. A Data Journey of Differentiating Capabilities
Unstructured Data
Notes, Docs,
Emails, Articles
Structured Data
Relational, CSV,
HDFS, External
Data Feeds
AccessPrepareCatalogIngest
NLP, Text Analytics,
Sentiment Analysis
Hi-Res Analytics
Data Catalog
Graphmarts
Data Layers
Data Lake
[Metadata or Data]
Semantic
Layer
HTTP
ODATA
Services
Business
User
IT
User
- 47. Capability 1 - Ingestion and Cataloging
Unstructured Data
Notes, Docs,
Emails, Articles
Structured Data
Relational, CSV,
HDFS, External
Data Feeds
AccessPrepareCatalogIngest
NLP, Text Analytics,
Sentiment Analysis
Hi-Res Analytics
Data Catalog
Graphmarts
Data Layers
Data Lake
[Metadata or Data]
Semantic
Layer
HTTP
ODATA
Services
Business
User
IT
User
- 48. Capability 2 – Unstructured Data Ingestion
Unstructured Data
Notes, Docs,
Emails, Articles
Structured Data
Relational, CSV,
HDFS, External
Data Feeds
AccessPrepareCatalogIngest
NLP, Text Analytics,
Sentiment Analysis
Hi-Res Analytics
Data Catalog
Graphmarts
Data Layers
Data Lake
[Metadata or Data]
Semantic
Layer
HTTP
ODATA
Services
Business
User
NoSQL
IT
User
- 49. Capability 3 – Graphmarts and Data Layers
Unstructured Data
Notes, Docs,
Emails, Articles
Structured Data
Relational, CSV,
HDFS, External
Data Feeds
AccessPrepareCatalogIngest
NLP, Text Analytics,
Sentiment Analysis
Hi-Res Analytics
Data Catalog
Graphmarts
Data Layers
Data Lake
[Metadata or Data]
Semantic
Layer
HTTP
ODATA
Services
Business
User
NoSQL
IT
UserCan Be Tabular
- 50. Virtual Hub and Spoke ETL
Structured Data
Relational, CSV,
HDFS, Data Feeds
External and Internal
AccessPrepareCatalogIngest
Data Catalog and
Metadata Capture
On Demand Access
to Data
Big Data Stores
MappingMapping
Semantic
Layer
- 55. Capability 4 – Hi-Res Analytics™
Unstructured Data
Notes, Docs,
Emails, Articles
Structured Data
Relational, CSV,
HDFS, External
Data Feeds
AccessPrepareCatalogIngest
NLP, Text Analytics,
Sentiment Analysis
Hi-Res Analytics
Data Catalog
Graphmarts
Data Layers
Data Lake
[Metadata or Data]
Semantic
Layer
HTTP
ODATA
Services
Business
User
NoSQL
IT
UserCan Be Tabular
- 56. ©2017 Cambridge Semantics Inc. All rights reserved.
Creates a high resolution digital
twin of diverse and complex data
sets using open W3C standards –
structured and unstructured
Enhance Digital Transformation
Makes it easy for aspiring
citizen data scientists to ask
questions or extract data using
sophisticated but intuitive
auto-generation of queries
Empower Citizen Data Scientists
Uses the language of the
business to let users create and
share insights quickly by working
the way they think.
Make Data Understandable
A future-proof layer for fueling
data into emerging technologies
including ML and text analytics.
Build a Bridge to the Future
Anzo Smart Data Lake – Strategic Benefits
- 57. ©2017 Cambridge Semantics Inc. All rights reserved.
INDEFINITE
Drug Discovery Preclinical Product
Development
FDA Review Scale-Up to Mfg.
Post-Marketing
Surveillance
ONE FDA-
APPROVED
DRUG
0.5 – 2
YEARS6 – 7 YEARS3 – 6 YEARS
NUMBER OF VOLUNTEERS
PHASE
1
PHASE
2
PHASE
3
5250~ 5,000 – 10,000
COMPOUNDS
PRE-DISCOVERY
20–100 100–500 1,000–5,000
INDSUBMITTED
NDA/BLASUBMITTED
The Information Fabric – A Semantic Layer for the Enterprise
R & D
Intelligence
(CI)
Product
Development
& Regulatory
PV & Safety
Case
Management
Source of Influence
Commercial
Analytics
Clinical Trial
Operations
Medical
Advisory Board
Analytics
Real World
Research
Clinical Data
Standards
Management
Voice of the
Customer
Analytics
Clinical Trial
Exploratory
Analytics
- 58. How long does it take to implement?
Does ASDL replace my data lake?
Where can I find out more?
• Get started building your Semantic Layer today
• Build on the data lake investments you have already made
• Stop by our booth - 441
Getting Started