Data Warehouse Testing in the Pharmaceutical Industry

1
Presenter
Bill Hayduk
Founder / President
Presenter
Jeff Bocarsly, Ph.D.
Senior Architect
built bybuilt by
QuerySurge™

built by
The average organization loses $14.2 million annually
through poor Data Quality.
- Gartner
46% of companies cite Data Quality as a barrier
for adopting Business Intelligence products.
- InformationWeek
The cost per patient data of Phase 3 clinical studies of
new pharmaceuticals exceeds $26,000.
- Journal of Clinical Research Best Practices
built by
QuerySurge™

(1) Data Integrity (2) Compliance
built by
QuerySurge™

(1) Data Integrity
high risk of defects that are not readily visible
Missing Data
Truncation of Data
Data Type Mismatch
Null Translation errors
Incorrect Type Translation
Misplaced Data
Extra Records
Transformation Logic Errors/Holes
Simple/Small Errors
Sequence Generator errors
Undocumented Requirements
Not Enough Records
built by
QuerySurge™

(2) Compliance
Need to comply with Part 11 mandates
historical test information test version history
test execution data:
who, what & when
test cycle information
visibility of assets archived test results
built by
QuerySurge™

 Periodic data reporting to FDA
 Periodic data reporting to int’l
bodies
(1) Data Integrity (2) Compliance
 FDA announced audits
 Unannounced FDA audits
Consequences
Severe financial and
business
built by
QuerySurge™

 automate the manual testing of data
 compare millions of rows of data quickly
 flag mismatches and inconsistencies in data sets
 provide flexibility in scheduling test runs
 generate informative reports that can easily be shared
with the team
 validate up to 100% of all of all data, mitigating the risk
Need a testing solution that can…
built by
QuerySurge™

 track test history
 provide reporting on test version history
 record all test execution by testing owner’s name
and date
 deliver auditable reports of test cycles
 store all test outcomes and test data
 offer a read-only user for reviewing test assets
 support archiving of results
Need a testing solution that can…
built by
QuerySurge™

a software division ofQuerySurge™

is the smart testing solution that
automates the data validation & testing process
QuerySurge
QuerySurge™ a software division of
Use Cases

ETL
ETL
Mainframe
Business Intelligence
& Analytics
C-level executives are using BI &
Analytics to make critical business
decisions with the assumption
that the underlying data is fine
We know it is not
ETL
Typical data
issue areas

Web-based…
Supported OS...
Connects through…
…to any JDBC compliant data source
QuerySurge™
QuerySurge
Controller
QuerySurge Server
DB Server (MySQL)
App Server (Tomcat)
QuerySurge Agents
(Ships with 10 Agents)
a software division of
Installs...
…in the Cloud…on a VM…on a Bare Metal Server

• Market leader and visionary in automated data testing
• Launched in 2012, has 150+ customers in 30 countries
• Partner Ecosystem boasts 50+ world-renowned Technology
companies, global System Integrators, & regional consulting
firms
• Briefs leading analyst firms and appears in their reports as the
recommended solution for data testing
• Named Top 50 companies driving Big Data Innovation by
Database Trends and Applications
• In-depth, online Knowledge Base, formal classroom training,
& free Customer Training Portal
• Sets the gold standard for PoC and post-sale support

QuerySurge supports the following data stores…
• Amazon Redshift, Elastic Map Reduce, DynamoDB
• Apache Hadoop/Hive, Spark
• Cassandra
• Cloudera
• Couchbase
• Exasol
• Flat Files (delimited, fixed-width)
• Google BigQuery
• Hortonworks
• IBM (Cognos ,Db2, Netezza, Informix, Big Insights, Cloudant, MDM)
• JSON files
• Mainframe
• MAPR
• Micro Focus Vertica
• Microsoft (SQL Server DWH, HDInsight, PDW, SSAS, Excel, Access,
SharePoint)
• MicrosStrategy
• MongoDB
• Oracle (Oracle DB, MySQL, Exadata, NoSQL, Hadoop)
• Pivotal GreenPlum
• PostgreSQL
• Salesforce
• SAP (Business Objects, HANA, IQ, ASE, Altiscale Data Cloud)
• Snowflake
• Tableau
• Teradata, Aster
• Workday
• XML
…and any other data store
Flat Files
Excel

QuerySurge connects
to any 2 points
at one time
SQL
HQL
SQL
Comparison of every data set
Source
Data
Target
Data
Data Intelligence Reports, Data Analytics
Dashboard, automated emails
Results – pass/fail
Target Data
Big Data
stores
• Hadoop
• NoSQL
Data
Warehouses
XML
Web Services
Source Data
Data Stores
• Databases
• Data Warehouses
• Data Marts
Flat Files
• Fixed Width
• Delimited
• Excel
• JSON
Reports
Reports

ETL Developer: Codes data movement based on Mapping Requirements
Data Warehouse
ETL
Data Tester: Tests data movement based on Mapping Requirements
Data Mart
ETL
Source Data Big Data lake
Testing Point #1 Testing Point #2 Testing Point #3
BI & Analytics
BI User extracts
data for reports
Testing Point #4
Tester tests BI
Reports

Automate the entire testing cycle
 Automate the launch, execution, comparison, & emailed results
Smart Query Wizards - no coding needed
 Query Wizards create tests visually, without writing SQL
Test across different platforms
 Data Warehouse, Hadoop, NoSQL, DB, mainframe, flat files, XML,
JSON, BI Reports
Data Analytics & Intelligence
 Data Analytics Dashboard, Data Intelligence Reports, emailed results,
back-end data access
Create Custom Tests
 Modularize functions with snippets, set thresholds, stage data,
check data types
DevOps & Continuous Delivery
 API Integration with Build, Configuration, ETL & QA mgmt solutions

Design
Library
Scheduler
Run-Time
Dashboard
Query
Wizards
Data
Intelligence
Reports
Data Analytics
Dashboard
QuerySurge
for
DevOps

Fast and Easy.
No programming needed.
• Perform 80% of all data tests with no SQL coding
• Opens up testing to novices & non-technical members
• Speeds up testing for skilled coders
• provides a huge Return-On-Investment
QuerySurge™
QuerySurge™

QuerySurge™

Design Library
• Create custom Query Pairs (source & target
SQLs for tests that have transformations)
Scheduling
 Build groups of Query Pairs
 Schedule Test Runs
• Run immediately
• Run at set date/time
• Have event kick it off
QuerySurge™

Data Intelligence Reports
 Examine and automatically
email test results
Run Dashboard
 View real-time execution
 Analyze real-time results
QuerySurge™

Large Suite Jan 5, 2019 16:20:44 Jan 5, 2019
Jan 5, 2019 4:24 PM
Start Time
QuerySurge™
6 minutes

QuerySurge™
Row Failure Drill-Down

QuerySurge™
• view data reliability & pass rate
• add, move, filter, zoom-in on any
data widget & underlying data
• verify build success or failure

Run Test Scenario
Kill Test Scenario
Execution
Test Suite Results
Individual Test Results
Source and Target Data
Failed Record Data
Test Suite Execution Status
Retrieve
QueryPairs
Create / Modify / Delete
Datastore Connections
Test Suites
Staging Tables
Query Snippets
Staging Queries
With the new expanded QuerySurge DevOps API, customers now have the ability to perform
design and analysis operations externally from QuerySurge, which allows QuerySurge to be
adopted and integrated into any DevOps process that focuses around data.
QuerySurge Server

Front Line Support:
• Technical Resources available for POCs
(7:30am – 9:00pm New York time)
• Web conferencing sessions
• QuerySurge Customer Portal (free)
• QuerySurge Partner Portal (free)
Additional Support:
• Ticket support
• Knowledge Base
• Videos / Slide decks

(1) a Trial in the Cloud of QuerySurge, including self-learning
tutorial that works with sample data for 3 days or
(2) a Downloaded Trial of QuerySurge, including self-learning
tutorial with sample data or your data for 15 days or
(3) a Proof of Concept of QuerySurge, including a kickoff &
setup meeting and weekly meetings with our team of experts
for 30 days
http://www.querysurge.com/compare-trial-optionsfor more information, Go here
built by
QuerySurge™

Fortune 500 firm:
Clinical Trial Data
built by
QuerySurge™

Challenge
How can a Data Warehouse team assure data
integrity over multiple builds when the cost per patient
data of Phase 3 clinical studies exceeds $26,000 and
volume of live case data is > 1 TB?
Strategy
Implement QuerySurge™ to dramatically increase
coverage of data that is verified for each build.
Implementation
• 1,000 SQL queries written to compare case data from
the source systems to the DWH after ETL.
• QuerySurge™automated the scheduling, test runs,
comparisons and reporting for each build.
built by
QuerySurge™

Metrics
 500 mappings
 2.5 million data items
 1.25 billion verifications
 Complete run finished in 7 days
 45% of data was covered.
 14 builds were deployed
 115 defects were discovered and
remediated
Benefits
• 10-fold increase in the speed of testing.
• Huge increase in coverage of data (from less than 1/10 % to 45%)
• Production defects discovered that were missed in previous cycles
• Huge savings on clean records (115 defects x $26,000/record)
• A huge time savings (3.6 years x 10 people)
• Avoidance of lawsuits and FDA fines
built by
QuerySurge™

built by
QuerySurge™
QuerySurge
For more on the Pharma & QuerySurge, go to
www.querysurge.com/solutions/pharmaceutical-industry

Data Warehouse Testing in the Pharmaceutical Industry

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Data Warehouse Testing in the Pharmaceutical Industry

Similar to Data Warehouse Testing in the Pharmaceutical Industry (20)

More from RTTS

More from RTTS (13)

Recently uploaded

Recently uploaded (20)

Data Warehouse Testing in the Pharmaceutical Industry

Editor's Notes