SlideShare a Scribd company logo
1
Presenter
Bill Hayduk
Founder / President
Presenter
Jeff Bocarsly, Ph.D.
Senior Architect
built bybuilt by
QuerySurge™
built by
The average organization loses $14.2 million annually
through poor Data Quality.
- Gartner
46% of companies cite Data Quality as a barrier
for adopting Business Intelligence products.
- InformationWeek
The cost per patient data of Phase 3 clinical studies of
new pharmaceuticals exceeds $26,000.
- Journal of Clinical Research Best Practices
built by
QuerySurge™
built by
QuerySurge™
(1) Data Integrity (2) Compliance
built by
QuerySurge™

Recommended for you

Heterogeneous databases
Heterogeneous databasesHeterogeneous databases
Heterogeneous databases

This document discusses heterogeneous database systems. It defines a heterogeneous database system as an automated or semi-automated system that integrates disparate database management systems to present a unified query interface to users. It discusses issues in multi-database query processing such as query support, cost, translation and change adaptation. The architecture involves individual databases, wrapper methods, a mediator and query processing/optimization. Database integration involves schema integration through a bottom-up design approach and the conversion of local schemas to a global schema.

heterogeneous dbmsdatabase
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses

This document provides an overview of the process for gathering business requirements for a data management and warehousing project. It discusses why requirements are gathered, the types of requirements needed, how business processes create data in the form of dimensions and measures, and how the gathered requirements will be used to design reports to meet business needs. A straw-man proposal is presented as a starting point for further discussion.

data warehousedata warehousingrequirements
Data warehouse proposal
Data warehouse proposalData warehouse proposal
Data warehouse proposal

This document proposes creating a data warehouse at Rivier College to address several challenges: data is locked in different systems requiring manual extraction; administrators struggle to pull consistent data for reporting from different sources; and data analysis is basic without standard processes. The goals of the data warehouse are to improve planning and decision making through timely delivery of standardized, repeatable reports from a centralized collection of integrated, nonvolatile data. It will evolve to incorporate more institutional data sources over time.

(1) Data Integrity
high risk of defects that are not readily visible
Missing Data
Truncation of Data
Data Type Mismatch
Null Translation errors
Incorrect Type Translation
Misplaced Data
Extra Records
Transformation Logic Errors/Holes
Simple/Small Errors
Sequence Generator errors
Undocumented Requirements
Not Enough Records
built by
QuerySurge™
(2) Compliance
Need to comply with Part 11 mandates
historical test information test version history
test execution data:
who, what & when
test cycle information
visibility of assets archived test results
built by
QuerySurge™
 Periodic data reporting to FDA
 Periodic data reporting to int’l
bodies
(1) Data Integrity (2) Compliance
 FDA announced audits
 Unannounced FDA audits
Consequences
Severe financial and
business
built by
QuerySurge™
built by
QuerySurge™

Recommended for you

Data engineer perfomance appraisal 2
Data engineer perfomance appraisal 2Data engineer perfomance appraisal 2
Data engineer perfomance appraisal 2

This document contains information about performance evaluation methods for a data engineer, including examples of performance review phrases. It discusses 12 common methods for evaluating a data engineer's performance: management by objectives, critical incident method, behaviorally anchored rating scales, behavioral observation scales, 360 degree appraisal, and checklist and weighted checklist methods. For each method, it provides details on how the method works and examples of positive and negative phrases that could be used in a performance review. The document is intended to provide useful resources for conducting a data engineer's performance appraisal.

DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSING

Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. This ppt is about the cleaning and pre-processing.

pre processingdata cleaningppt
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture

In this talk we’ll present how at GetYourGuide we’ve built from scratch a completely new ETL pipeline using Debezium, Kafka, Spark and Airflow, which can automatically handle schema changes. Our starting point was an error prone legacy system that ran daily, and was vulnerable to breaking schema changes, which caused many sleepless on-call nights. As most companies, we also have traditional SQL databases that we need to connect to in order to extract relevant data. This is done usually through either full or partial copies of the data with tools such as sqoop. However another approach that has become quite popular lately is to use Debezium as the Change Data Capture layer which reads databases binlogs, and stream these changes directly to Kafka. As having data once a day is not enough anymore for our bussiness, and we wanted our pipelines to be resilent to upstream schema changes, we’ve decided to rebuild our ETL using Debezium. We’ll walk the audience through the steps we followed to architect and develop such solution using Databricks to reduce operation time. By building this new pipeline we are now able to refresh our data lake multiple times a day, giving our users fresh data, and protecting our nights of sleep.

* 
apache spark

 *big data

 *ai

 *
 automate the manual testing of data
 compare millions of rows of data quickly
 flag mismatches and inconsistencies in data sets
 provide flexibility in scheduling test runs
 generate informative reports that can easily be shared
with the team
 validate up to 100% of all of all data, mitigating the risk
Need a testing solution that can…
built by
QuerySurge™
 track test history
 provide reporting on test version history
 record all test execution by testing owner’s name
and date
 deliver auditable reports of test cycles
 store all test outcomes and test data
 offer a read-only user for reviewing test assets
 support archiving of results
Need a testing solution that can…
built by
QuerySurge™
a software division ofQuerySurge™
is the smart testing solution that
automates the data validation & testing process
QuerySurge
QuerySurge™ a software division of
Use Cases

Recommended for you

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup

This document discusses the need for observability in data pipelines. It notes that real data pipelines often fail or take a long time to rerun without providing any insight into what went wrong. This is because of frequent code, data, dependency, and infrastructure changes. The document recommends taking a production engineering approach to observability using metrics, logging, and alerting tools. It also suggests experiment management and encapsulating reporting in notebooks. Most importantly, it stresses measuring everything through metrics at all stages of data ingestion and processing to better understand where issues occur.

Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data

This document provides an overview of data warehousing, OLAP, data mining, and big data. It discusses how data warehouses integrate data from different sources to create a consistent view for analysis. OLAP enables interactive analysis of aggregated data through multidimensional views and calculations. Data mining finds hidden patterns in large datasets through techniques like predictive modeling, segmentation, link analysis and deviation detection. The document provides examples of how these technologies are used in industries like retail, banking and insurance.

Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data

This document summarizes an analysis of unstructured data and text analytics. It discusses how text analytics can extract meaning from unstructured sources like emails, surveys, forums to enhance applications like search, information extraction, and predictive analytics. Examples show how tools can extract entities, relationships, sentiments to gain insights from sources in domains like healthcare, law enforcement, and customer experience.

seth grimesunstructured dataasa
ETL
ETL
Mainframe
Business Intelligence
& Analytics
C-level executives are using BI &
Analytics to make critical business
decisions with the assumption
that the underlying data is fine
We know it is not
ETL
Typical data
issue areas
Web-based…
Supported OS...
Connects through…
…to any JDBC compliant data source
QuerySurge™
QuerySurge
Controller
QuerySurge Server
DB Server (MySQL)
App Server (Tomcat)
QuerySurge Agents
(Ships with 10 Agents)
a software division of
Installs...
…in the Cloud…on a VM…on a Bare Metal Server
• Market leader and visionary in automated data testing
• Launched in 2012, has 150+ customers in 30 countries
• Partner Ecosystem boasts 50+ world-renowned Technology
companies, global System Integrators, & regional consulting
firms
• Briefs leading analyst firms and appears in their reports as the
recommended solution for data testing
• Named Top 50 companies driving Big Data Innovation by
Database Trends and Applications
• In-depth, online Knowledge Base, formal classroom training,
& free Customer Training Portal
• Sets the gold standard for PoC and post-sale support
a software division ofQuerySurge™
QuerySurge supports the following data stores…
• Amazon Redshift, Elastic Map Reduce, DynamoDB
• Apache Hadoop/Hive, Spark
• Cassandra
• Cloudera
• Couchbase
• Exasol
• Flat Files (delimited, fixed-width)
• Google BigQuery
• Hortonworks
• IBM (Cognos ,Db2, Netezza, Informix, Big Insights, Cloudant, MDM)
• JSON files
• Mainframe
• MAPR
• Micro Focus Vertica
• Microsoft (SQL Server DWH, HDInsight, PDW, SSAS, Excel, Access,
SharePoint)
• MicrosStrategy
• MongoDB
• Oracle (Oracle DB, MySQL, Exadata, NoSQL, Hadoop)
• Pivotal GreenPlum
• PostgreSQL
• Salesforce
• SAP (Business Objects, HANA, IQ, ASE, Altiscale Data Cloud)
• Snowflake
• Tableau
• Teradata, Aster
• Workday
• XML
…and any other data store
Flat Files
Excel

Recommended for you

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Data lineage tracking is one of the significant problems that financial institutions face when using modern big data tools. This presentation describes Spline – a data lineage tracking and visualization tool for Apache Spark. Spline captures and stores lineage information from internal Spark execution plans and visualizes it in a user-friendly manner.

apache sparkspark summit
SAP HANA Data integration using Informatica
SAP HANA Data integration using InformaticaSAP HANA Data integration using Informatica
SAP HANA Data integration using Informatica

This document discusses Informatica's data integration solutions for SAP customers. It summarizes Informatica's strategic relationship with SAP since 1998, their current SAP certifications including for SAP HANA, and details of Informatica's connectivity, integration patterns, and information lifecycle management solutions that are certified to work with SAP applications and HANA. It also provides a benchmark showing high performance for loading and extracting data from SAP HANA using Informatica PowerExchange.

integrationhanasap
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management

This document discusses enterprise data management. It defines enterprise data management as removing organizational data issues by defining accurate, consistent, and transparent data that can be created, integrated, disseminated, and managed across enterprise applications in a timely manner. It also discusses the need for a structured data delivery strategy from producers to consumers. The document then outlines some key enterprise data categories and provides a conceptual and logical view of an enterprise master data lineage architecture with data flowing between transactional systems, a data management layer, and analytics.

QuerySurge connects
to any 2 points
at one time
SQL
HQL
SQL
Comparison of every data set
Source
Data
Target
Data
Data Intelligence Reports, Data Analytics
Dashboard, automated emails
Results – pass/fail
Target Data
Big Data
stores
• Hadoop
• NoSQL
Data
Warehouses
XML
Web Services
Source Data
Data Stores
• Databases
• Data Warehouses
• Data Marts
Flat Files
• Fixed Width
• Delimited
• Excel
• JSON
Business Intelligence
Reports
Business Intelligence
Reports
ETL Developer: Codes data movement based on Mapping Requirements
Data Warehouse
ETL
Data Tester: Tests data movement based on Mapping Requirements
Data Mart
ETL
Source Data Big Data lake
Testing Point #1 Testing Point #2 Testing Point #3
BI & Analytics
BI User extracts
data for reports
Testing Point #4
Tester tests BI
Reports
a software division ofQuerySurge™
Automate the entire testing cycle
 Automate the launch, execution, comparison, & emailed results
Smart Query Wizards - no coding needed
 Query Wizards create tests visually, without writing SQL
Test across different platforms
 Data Warehouse, Hadoop, NoSQL, DB, mainframe, flat files, XML,
JSON, BI Reports
Data Analytics & Intelligence
 Data Analytics Dashboard, Data Intelligence Reports, emailed results,
back-end data access
Create Custom Tests
 Modularize functions with snippets, set thresholds, stage data,
check data types
DevOps & Continuous Delivery
 API Integration with Build, Configuration, ETL & QA mgmt solutions
Design
Library
Scheduler
Run-Time
Dashboard
Query
Wizards
a software division ofQuerySurge™
Data
Intelligence
Reports
Data Analytics
Dashboard
QuerySurge
for
DevOps

Recommended for you

Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality

Garbage in, garbage out - we have all heard about the importance of data quality. Having high quality data is essential for all types of use cases, whether it is reporting, anomaly detection, or for avoiding bias in machine learning applications. But where does high quality data come from? How can one assess data quality, improve quality if necessary, and prevent bad quality from slipping in? Obtaining good data quality involves several engineering challenges. In this presentation, we will go through tools and strategies that help us measure, monitor, and improve data quality. We will enumerate factors that can cause data collection and data processing to cause data quality issues, and we will show how to use engineering to detect and mitigate data quality problems.

big datadata qualitybatch processing
Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process

This document discusses key aspects of migrating a database from SQL Server to Oracle 11g. The major steps in a migration are analysis, migration, testing, and deployment. The migration process involves migrating the schema and objects, business logic, and client applications. Tools like Oracle Migration Workbench and Database Migration Verifier help automate the migration and validation of the migrated schema and data.

oraclesql server
Azure data factory
Azure data factoryAzure data factory
Azure data factory

What are the evolving approaches to analytics? What is Azure Data Factory? Capabilities of Azure Data Factory

azure data factoryintegrate 2014microsoft azure
Fast and Easy.
No programming needed.
• Perform 80% of all data tests with no SQL coding
• Opens up testing to novices & non-technical members
• Speeds up testing for skilled coders
• provides a huge Return-On-Investment
a software division of
QuerySurge™
QuerySurge™
a software division of
QuerySurge™
Design Library
• Create custom Query Pairs (source & target
SQLs for tests that have transformations)
Scheduling
 Build groups of Query Pairs
 Schedule Test Runs
• Run immediately
• Run at set date/time
• Have event kick it off
a software division of
QuerySurge™
Data Intelligence Reports
 Examine and automatically
email test results
Run Dashboard
 View real-time execution
 Analyze real-time results
QuerySurge™
a software division of

Recommended for you

ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300

Azure Data Factory can now use Mapping Data Flows to orchestrate ETL workloads. Mapping Data Flows allow users to visually design transformations on data from disparate sources and load the results into Azure SQL Data Warehouse for analytics. The key benefits of Mapping Data Flows are that they provide a visual interface for building expressions to cleanse and join data with auto-complete assistance and live previews of expression results.

microsoft azure data factorymapping data flows
Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)

This presentation gives a brief introduction to Business Intelligence (BI), its need and its applications.

business intelligencedecision support systemdss
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution

The document discusses QuerySurge, an automated data testing solution that helps verify data quality and find errors. It notes that traditional data quality tools focus on profiling, cleansing and monitoring data, while QuerySurge also enables data testing through easy-to-use query wizards and comparison of source and target data without SQL coding. QuerySurge allows collaborative testing across teams and platforms, integrates with development tools, and can significantly reduce testing time and improve data quality.

 
by RTTS
business intelligencedata warehouse testingquery surge
Large Suite Jan 5, 2019 16:20:44 Jan 5, 2019
Jan 5, 2019 4:24 PM
Start Time
QuerySurge™
6 minutes
QuerySurge™
Row Failure Drill-Down
QuerySurge™
• view data reliability & pass rate
• add, move, filter, zoom-in on any
data widget & underlying data
• verify build success or failure
a software division of
a software division ofQuerySurge™
Run Test Scenario
Kill Test Scenario
Execution
Test Suite Results
Individual Test Results
Source and Target Data
Failed Record Data
Test Suite Execution Status
Retrieve
QueryPairs
Create / Modify / Delete
Datastore Connections
Test Suites
Staging Tables
Query Snippets
Staging Queries
With the new expanded QuerySurge DevOps API, customers now have the ability to perform
design and analysis operations externally from QuerySurge, which allows QuerySurge to be
adopted and integrated into any DevOps process that focuses around data.
QuerySurge Server

Recommended for you

Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming

Fast and easy. No Programming needed. The latest QuerySurge release introduces the new Query Wizards. The Wizards allow both novice and experienced team members to validate their organization's data quickly with no SQL programming required. The Wizards provide an immediate ROI through their ease-of-use and ensure that minimal time and effort are required for developing tests and obtaining results. Even novice testers are productive as soon as they start using the Wizards! According to a recent survey of Data Architects and other data experts on LinkedIn, approximately 80% of columns in a data warehouse have no transformations, meaning the Wizards can test all of these columns quickly & easily, (The columns with transformations can be tested using the QuerySurge Design library using custom SQL coding.) There are 3 Types of automated Data Comparisons: - Column-Level Comparison - Table-Level Comparison - Row Count Comparison There are also automated features for filtering (‘Where’ clause) and sorting (‘Order By’ clause). The Wizards provide both novices and non-technical team members with a fast & easy way to be productive immediately and speed up testing for team members skilled in SQL. Trial our software either as a download or in the cloud at www.QuerySurge.com. The trial comes with a built-in tutorial and sample data.

 
by RTTS
big data testdata testingbig data
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality

You've made the move to MongoDB for its flexible schema and querying capabilities in order to enhance agility and reduce costs for your business. Shouldn't your data quality process be just as organized and efficient? Using QuerySurge for testing your MongoDB data as part of your quality effort will increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your Big Data store. QuerySurge will help you keep your team organized and on track too! To learn more about QuerySurge, visit www.QuerySurge.com

 
by RTTS
big datadata qualityquerysurge
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing

This document discusses automating enterprise application and data warehouse testing using QuerySurge. It begins with an introduction to QuerySurge and its modules for automating data interface testing. These modules allow testing across different data sources with no coding required. The document then covers data maturity models and how QuerySurge can help improve testing processes. It demonstrates how QuerySurge can automate testing to gain full coverage while decreasing testing time. In conclusion, it discusses how QuerySurge provides value through increased testing efficiency and data quality.

 
by RTTS
erpcrmsap testing
Front Line Support:
• Technical Resources available for POCs
(7:30am – 9:00pm New York time)
• Web conferencing sessions
• QuerySurge Customer Portal (free)
• QuerySurge Partner Portal (free)
Additional Support:
• Ticket support
• Knowledge Base
• Videos / Slide decks
(1) a Trial in the Cloud of QuerySurge, including self-learning
tutorial that works with sample data for 3 days or
(2) a Downloaded Trial of QuerySurge, including self-learning
tutorial with sample data or your data for 15 days or
(3) a Proof of Concept of QuerySurge, including a kickoff &
setup meeting and weekly meetings with our team of experts
for 30 days
http://www.querysurge.com/compare-trial-optionsfor more information, Go here
built by
QuerySurge™
Fortune 500 firm:
Clinical Trial Data
built by
QuerySurge™
Challenge
How can a Data Warehouse team assure data
integrity over multiple builds when the cost per patient
data of Phase 3 clinical studies exceeds $26,000 and
volume of live case data is > 1 TB?
Strategy
Implement QuerySurge™ to dramatically increase
coverage of data that is verified for each build.
Implementation
• 1,000 SQL queries written to compare case data from
the source systems to the DWH after ETL.
• QuerySurge™automated the scheduling, test runs,
comparisons and reporting for each build.
built by
QuerySurge™

Recommended for you

Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica

Are you using HPE ALM or Quality Center (QC) for your requirements gathering and test management? RTTS, an alliance partner of HPE and a member of HPE’s Big Data community, can show you how to use ALM/QC and RTTS’ QuerySurge to effectively manage your data validation & testing of Vertica (or any data warehouse). In this webinar video you will see: - a custom view of ALM to store source-to-target mappings - data validation tests in QuerySurge - the execution of QuerySurge tests from ALM - the results of data validation tests stored in ALM - custom ALM reports that show data validation coverage of Vertica how we improve your data quality while reducing your costs & risks Presented by: Bill Hayduk, Founder & CEO of RTTS, the developers of QuerySurge Chris Thompson, Senior Domain Expert, Big Data testing To learn more about QuerySurge, visit www.QuerySurge.com

 
by RTTS
big data testingbig data testtest management
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing

We explore how extract, transform and load (ETL) testing with SQL scripting is crucial to data validation and show how to test data on a large scale in a streamlined manner with an Informatica ETL testing tool.

etl testingverticadata extraction
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses

This document discusses challenges and opportunities in automating testing for data warehouses and BI systems. It notes that while BI projects have adopted agile methodologies, testing has not. Large and diverse data volumes make testing nearly infinite test cases difficult. It proposes a testing lifecycle and V-model for BI systems. Automating complex functional tests, SQL validation, reconciliation, and test data generation can help address challenges by shortening regression cycles and enabling continuous testing. Various automation tools are discussed, including how they can validate ETL processes and reporting integrity. Automation can help complete testing and ensure data quality, compliance, and performance.

testingdata warehousingautomation
Metrics
 500 mappings
 2.5 million data items
 1.25 billion verifications
 Complete run finished in 7 days
 45% of data was covered.
 14 builds were deployed
 115 defects were discovered and
remediated
Benefits
• 10-fold increase in the speed of testing.
• Huge increase in coverage of data (from less than 1/10 % to 45%)
• Production defects discovered that were missed in previous cycles
• Huge savings on clean records (115 defects x $26,000/record)
• A huge time savings (3.6 years x 10 people)
• Avoidance of lawsuits and FDA fines
built by
QuerySurge™
built by
QuerySurge™
QuerySurge
For more on the Pharma & QuerySurge, go to
www.querysurge.com/solutions/pharmaceutical-industry

More Related Content

What's hot

Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
anicewick
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
Valerii Klymchuk
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Richard Bányi
 
Heterogeneous databases
Heterogeneous databasesHeterogeneous databases
Heterogeneous databases
ravikamma26
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
David Walker
 
Data warehouse proposal
Data warehouse proposalData warehouse proposal
Data warehouse proposal
Peter Macdonald
 
Data engineer perfomance appraisal 2
Data engineer perfomance appraisal 2Data engineer perfomance appraisal 2
Data engineer perfomance appraisal 2
tonychoper1004
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSING
Ahtesham Ullah khan
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
Databricks
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
Ravinder Kamboj
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
SAP HANA Data integration using Informatica
SAP HANA Data integration using InformaticaSAP HANA Data integration using Informatica
SAP HANA Data integration using Informatica
Oracle
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
Bhavendra Chavan
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
Lars Albertsson
 
Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process
harirk1986
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
BizTalk360
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300
Mark Kromer
 
Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)
AkashBorse2
 

What's hot (20)

Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Heterogeneous databases
Heterogeneous databasesHeterogeneous databases
Heterogeneous databases
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
Data warehouse proposal
Data warehouse proposalData warehouse proposal
Data warehouse proposal
 
Data engineer perfomance appraisal 2
Data engineer perfomance appraisal 2Data engineer perfomance appraisal 2
Data engineer perfomance appraisal 2
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSING
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Data warehouse,data mining & Big Data
Data warehouse,data mining & Big DataData warehouse,data mining & Big Data
Data warehouse,data mining & Big Data
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
SAP HANA Data integration using Informatica
SAP HANA Data integration using InformaticaSAP HANA Data integration using Informatica
SAP HANA Data integration using Informatica
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Oracle To Sql Server migration process
Oracle To Sql Server migration processOracle To Sql Server migration process
Oracle To Sql Server migration process
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300
 
Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)
 

Similar to Data Warehouse Testing in the Pharmaceutical Industry

QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
RTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
RTTS
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
RTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
RTTS
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
Cognizant
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
Patrick Van Renterghem
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
RTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
RTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
RTTS
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
sivam_1
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
Kellyn Pot'Vin-Gorman
 
Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
Vinny (Gurvinder) Ahuja
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
Resume sailaja
Resume sailajaResume sailaja
Resume sailaja
SailajaPrasadMohanty
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23
Jason Packer
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
Databricks
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
RTTS
 

Similar to Data Warehouse Testing in the Pharmaceutical Industry (20)

QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward593 Managing Enterprise Data Quality Using SAP Information Steward
593 Managing Enterprise Data Quality Using SAP Information Steward
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Resume sailaja
Resume sailajaResume sailaja
Resume sailaja
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 

More from RTTS

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
RTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
RTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
RTTS
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
RTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
RTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
RTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
RTTS
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
RTTS
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
RTTS
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
RTTS
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
 

More from RTTS (13)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 

Recently uploaded

Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 

Recently uploaded (20)

Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 

Data Warehouse Testing in the Pharmaceutical Industry

  • 1. 1 Presenter Bill Hayduk Founder / President Presenter Jeff Bocarsly, Ph.D. Senior Architect built bybuilt by QuerySurge™
  • 2. built by The average organization loses $14.2 million annually through poor Data Quality. - Gartner 46% of companies cite Data Quality as a barrier for adopting Business Intelligence products. - InformationWeek The cost per patient data of Phase 3 clinical studies of new pharmaceuticals exceeds $26,000. - Journal of Clinical Research Best Practices built by QuerySurge™
  • 4. (1) Data Integrity (2) Compliance built by QuerySurge™
  • 5. (1) Data Integrity high risk of defects that are not readily visible Missing Data Truncation of Data Data Type Mismatch Null Translation errors Incorrect Type Translation Misplaced Data Extra Records Transformation Logic Errors/Holes Simple/Small Errors Sequence Generator errors Undocumented Requirements Not Enough Records built by QuerySurge™
  • 6. (2) Compliance Need to comply with Part 11 mandates historical test information test version history test execution data: who, what & when test cycle information visibility of assets archived test results built by QuerySurge™
  • 7.  Periodic data reporting to FDA  Periodic data reporting to int’l bodies (1) Data Integrity (2) Compliance  FDA announced audits  Unannounced FDA audits Consequences Severe financial and business built by QuerySurge™
  • 9.  automate the manual testing of data  compare millions of rows of data quickly  flag mismatches and inconsistencies in data sets  provide flexibility in scheduling test runs  generate informative reports that can easily be shared with the team  validate up to 100% of all of all data, mitigating the risk Need a testing solution that can… built by QuerySurge™
  • 10.  track test history  provide reporting on test version history  record all test execution by testing owner’s name and date  deliver auditable reports of test cycles  store all test outcomes and test data  offer a read-only user for reviewing test assets  support archiving of results Need a testing solution that can… built by QuerySurge™
  • 11. a software division ofQuerySurge™
  • 12. is the smart testing solution that automates the data validation & testing process QuerySurge QuerySurge™ a software division of Use Cases
  • 13. ETL ETL Mainframe Business Intelligence & Analytics C-level executives are using BI & Analytics to make critical business decisions with the assumption that the underlying data is fine We know it is not ETL Typical data issue areas
  • 14. Web-based… Supported OS... Connects through… …to any JDBC compliant data source QuerySurge™ QuerySurge Controller QuerySurge Server DB Server (MySQL) App Server (Tomcat) QuerySurge Agents (Ships with 10 Agents) a software division of Installs... …in the Cloud…on a VM…on a Bare Metal Server
  • 15. • Market leader and visionary in automated data testing • Launched in 2012, has 150+ customers in 30 countries • Partner Ecosystem boasts 50+ world-renowned Technology companies, global System Integrators, & regional consulting firms • Briefs leading analyst firms and appears in their reports as the recommended solution for data testing • Named Top 50 companies driving Big Data Innovation by Database Trends and Applications • In-depth, online Knowledge Base, formal classroom training, & free Customer Training Portal • Sets the gold standard for PoC and post-sale support a software division ofQuerySurge™
  • 16. QuerySurge supports the following data stores… • Amazon Redshift, Elastic Map Reduce, DynamoDB • Apache Hadoop/Hive, Spark • Cassandra • Cloudera • Couchbase • Exasol • Flat Files (delimited, fixed-width) • Google BigQuery • Hortonworks • IBM (Cognos ,Db2, Netezza, Informix, Big Insights, Cloudant, MDM) • JSON files • Mainframe • MAPR • Micro Focus Vertica • Microsoft (SQL Server DWH, HDInsight, PDW, SSAS, Excel, Access, SharePoint) • MicrosStrategy • MongoDB • Oracle (Oracle DB, MySQL, Exadata, NoSQL, Hadoop) • Pivotal GreenPlum • PostgreSQL • Salesforce • SAP (Business Objects, HANA, IQ, ASE, Altiscale Data Cloud) • Snowflake • Tableau • Teradata, Aster • Workday • XML …and any other data store Flat Files Excel
  • 17. QuerySurge connects to any 2 points at one time SQL HQL SQL Comparison of every data set Source Data Target Data Data Intelligence Reports, Data Analytics Dashboard, automated emails Results – pass/fail Target Data Big Data stores • Hadoop • NoSQL Data Warehouses XML Web Services Source Data Data Stores • Databases • Data Warehouses • Data Marts Flat Files • Fixed Width • Delimited • Excel • JSON Business Intelligence Reports Business Intelligence Reports
  • 18. ETL Developer: Codes data movement based on Mapping Requirements Data Warehouse ETL Data Tester: Tests data movement based on Mapping Requirements Data Mart ETL Source Data Big Data lake Testing Point #1 Testing Point #2 Testing Point #3 BI & Analytics BI User extracts data for reports Testing Point #4 Tester tests BI Reports
  • 19. a software division ofQuerySurge™ Automate the entire testing cycle  Automate the launch, execution, comparison, & emailed results Smart Query Wizards - no coding needed  Query Wizards create tests visually, without writing SQL Test across different platforms  Data Warehouse, Hadoop, NoSQL, DB, mainframe, flat files, XML, JSON, BI Reports Data Analytics & Intelligence  Data Analytics Dashboard, Data Intelligence Reports, emailed results, back-end data access Create Custom Tests  Modularize functions with snippets, set thresholds, stage data, check data types DevOps & Continuous Delivery  API Integration with Build, Configuration, ETL & QA mgmt solutions
  • 20. Design Library Scheduler Run-Time Dashboard Query Wizards a software division ofQuerySurge™ Data Intelligence Reports Data Analytics Dashboard QuerySurge for DevOps
  • 21. Fast and Easy. No programming needed. • Perform 80% of all data tests with no SQL coding • Opens up testing to novices & non-technical members • Speeds up testing for skilled coders • provides a huge Return-On-Investment a software division of QuerySurge™ QuerySurge™
  • 22. a software division of QuerySurge™
  • 23. Design Library • Create custom Query Pairs (source & target SQLs for tests that have transformations) Scheduling  Build groups of Query Pairs  Schedule Test Runs • Run immediately • Run at set date/time • Have event kick it off a software division of QuerySurge™
  • 24. Data Intelligence Reports  Examine and automatically email test results Run Dashboard  View real-time execution  Analyze real-time results QuerySurge™ a software division of
  • 25. Large Suite Jan 5, 2019 16:20:44 Jan 5, 2019 Jan 5, 2019 4:24 PM Start Time QuerySurge™ 6 minutes
  • 27. QuerySurge™ • view data reliability & pass rate • add, move, filter, zoom-in on any data widget & underlying data • verify build success or failure a software division of
  • 28. a software division ofQuerySurge™ Run Test Scenario Kill Test Scenario Execution Test Suite Results Individual Test Results Source and Target Data Failed Record Data Test Suite Execution Status Retrieve QueryPairs Create / Modify / Delete Datastore Connections Test Suites Staging Tables Query Snippets Staging Queries With the new expanded QuerySurge DevOps API, customers now have the ability to perform design and analysis operations externally from QuerySurge, which allows QuerySurge to be adopted and integrated into any DevOps process that focuses around data. QuerySurge Server
  • 29. Front Line Support: • Technical Resources available for POCs (7:30am – 9:00pm New York time) • Web conferencing sessions • QuerySurge Customer Portal (free) • QuerySurge Partner Portal (free) Additional Support: • Ticket support • Knowledge Base • Videos / Slide decks
  • 30. (1) a Trial in the Cloud of QuerySurge, including self-learning tutorial that works with sample data for 3 days or (2) a Downloaded Trial of QuerySurge, including self-learning tutorial with sample data or your data for 15 days or (3) a Proof of Concept of QuerySurge, including a kickoff & setup meeting and weekly meetings with our team of experts for 30 days http://www.querysurge.com/compare-trial-optionsfor more information, Go here built by QuerySurge™
  • 31. Fortune 500 firm: Clinical Trial Data built by QuerySurge™
  • 32. Challenge How can a Data Warehouse team assure data integrity over multiple builds when the cost per patient data of Phase 3 clinical studies exceeds $26,000 and volume of live case data is > 1 TB? Strategy Implement QuerySurge™ to dramatically increase coverage of data that is verified for each build. Implementation • 1,000 SQL queries written to compare case data from the source systems to the DWH after ETL. • QuerySurge™automated the scheduling, test runs, comparisons and reporting for each build. built by QuerySurge™
  • 33. Metrics  500 mappings  2.5 million data items  1.25 billion verifications  Complete run finished in 7 days  45% of data was covered.  14 builds were deployed  115 defects were discovered and remediated Benefits • 10-fold increase in the speed of testing. • Huge increase in coverage of data (from less than 1/10 % to 45%) • Production defects discovered that were missed in previous cycles • Huge savings on clean records (115 defects x $26,000/record) • A huge time savings (3.6 years x 10 people) • Avoidance of lawsuits and FDA fines built by QuerySurge™
  • 34. built by QuerySurge™ QuerySurge For more on the Pharma & QuerySurge, go to www.querysurge.com/solutions/pharmaceutical-industry

Editor's Notes

  1. Other Pharmaceutical Industry Complexities ------------------------------------------------------------------------ Industry consolidation causing massive integration of data FDA CFR Part 11 compliance A broad variety of data types and sources may be fed into a data warehouse. general Pharma-specific information exchange formats (e.g., HL7 feeds, CDISC feeds, other XML grammars) multiple proprietary and internal data formats, which may have been acquired in the process of industry consolidation.
  2. QuerySurge can automate the comparison of all data from source files and databases through different legs of the ETL process to the target data warehouse. QuerySurge can be scheduled to run immediately, next Monday at 11:00pm or when an event, such as the current ETL process ends. QuerySurge will execute tests that automate the comparison of target data to source data very quickly, comparing millions of rows of data in minutes. On completion of the run, QuerySurge will produce informative summary and detailed reports that can be viewed immediately or shared with the team via the automated email scheduler. QuerySurge will validate 100% of all of your data, providing full coverage and mitigating the risk while providing reports highlighting every data difference, down to the individual character.
  3. - tracks test history (user, date, each test version) - provides reporting on test version history for convenient auditing - supports tracking of deviations from approved tests - records all test execution owners by name and date - delivers auditable results reporting of test cycles - stores all test outcomes and test data for post-facto review or audit - offers a read-only user type for reviewing test assets - supports off-database archiving of results (for future restore) for effective long-term results data management
  4. QuerySurge provides insight into the health of your data throughout your organization through BI dashboards and reporting at your fingertips. It is a collaborative tool that allows for distributed use of the tool throughout your organization and provides for a sharable, holistic view of your data’s health and your organization’s level of maturity of your data management.
  5. Your distributed team from around the world can use any of these web browsers: Internet Explorer, Chrome, Firefox and Safari. Installs on operating systems: Windows & Linux. QS connects to any JDBC-compliant data source. Even if it is not listed here.
  6. QuerySurge finds bad data by natively connecting to: any data source, whether it is any type of database, flat file or xml and can connect to any data target, whether it is a db, file, xml, data warehouse or hadoop implementation. QuerySurge pulls data from the source and the target and compares them very quickly (typically in a few minutes) and then produces reports that show every data difference, even if there are millions of rows and hundreds of columns in the test. These reports can be automatically emailed to your team. You can pick from a multitude of reports or export the results so that you can build your own reports.