Are you jumping on the microservices bandwagon? When and when not to adopt micro services architecture? If you must, what are the considerations? This slidedeck will help answer a few of those questions...
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
An introduction to IBM Data Lake by Mandy Chessell CBE FREng CEng FBCS, Distinguished Engineer & Master Inventor. Learn more about IBM Data Lake: https://ibm.biz/Bdswi9
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
DataOps is a methodology and culture shift that brings the successful combination of development and operations (DevOps) to data processing environments. It breaks down silos between developers, data scientists, and operators, resulting in lean data feature development processes with quick feedback. In this presentation, we will explain the methodology, and focus on practical aspects of DataOps.
This document discusses data mesh, a distributed data management approach for microservices. It outlines the challenges of implementing microservice architecture including data decoupling, sharing data across domains, and data consistency. It then introduces data mesh as a solution, describing how to build the necessary infrastructure using technologies like Kubernetes and YAML to quickly deploy data pipelines and provision data across services and applications in a distributed manner. The document provides examples of how data mesh can be used to improve legacy system integration, batch processing efficiency, multi-source data aggregation, and cross-cloud/environment integration.
Data lineage tracking is one of the significant problems that financial institutions face when using modern big data tools. This presentation describes Spline – a data lineage tracking and visualization tool for Apache Spark. Spline captures and stores lineage information from internal Spark execution plans and visualizes it in a user-friendly manner.
This document discusses using Dataiku as an end-to-end enterprise AI platform to drive data science at scale using PostgreSQL, Greenplum, and Dataiku. It highlights how Dataiku allows business analysts, data engineers, and data scientists to collaborate on a single platform for data preparation, machine learning modeling, and model deployment. It also provides examples of how customers like a major software company have leveraged Dataiku to automate the deployment of over 12,000 predictive models.
This document provides an introduction and overview of Azure Data Lake. It describes Azure Data Lake as a single store of all data ranging from raw to processed that can be used for reporting, analytics and machine learning. It discusses key Azure Data Lake components like Data Lake Store, Data Lake Analytics, HDInsight and the U-SQL language. It compares Data Lakes to data warehouses and explains how Azure Data Lake Store, Analytics and U-SQL process and transform data at scale.
This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.
This document provides an introduction to microservices, including: - Microservices are small, independently deployable services that work together and are modeled around business domains. - They allow for independent scaling, technology diversity, and enable resiliency through failure design. - Implementing microservices requires automation, high cohesion, loose coupling, and stable APIs. Identifying service boundaries and designing for orchestration and data management are also important aspects of microservices design. - Microservices are not an end goal but a means to solve problems of scale; they must be adopted judiciously based on an organization's needs.
Organizations are faced with an increasingly complex data landscape, finding themselves unable to cope with exponentially increasing data volumes, compounded by additional regulatory requirements with increased fines for non-compliance. Enterprise architecture and data governance are often discussed at length, but often with different stakeholder audiences. This can result in complementary and sometimes conflicting initiatives rather than a focused, integrated approach. Data governance requires a solid data architecture foundation in order to support the pillars of enterprise architecture. In this session, IDERA’s Ron Huizenga will discuss a practical, integrated approach to effectively understand, define and implement an cohesive enterprise architecture and data governance discipline with integrated modeling and metadata management.
The document discusses the role of data engineers and data pipelines. It begins with an introduction to big data and why data volumes are increasing. It then covers what data engineers do, including building data architectures, working with cloud infrastructure, and programming for data ingestion, transformation, and loading. The document also explains data pipelines, describing extract, transform, load (ETL) processes and batch versus streaming data. It provides an example of Credit OK's data pipeline architecture on Google Cloud Platform that extracts raw data from various sources, cleanses and loads it into BigQuery, then distributes processed data to various applications. It emphasizes the importance of data engineers in processing and managing large, complex data sets.
Presentation by Databricks CEO Ion Stoica at Spark Summit 2014, giving an update on Spark adoption and announcing the Databricks Cloud service.
This document provides an overview of Azure SQL Data Warehouse. It discusses what Azure SQL Data Warehouse is, how it is provisioned and scaled, best practices for designing tables in Azure SQL DW including distribution keys and data types, and methods for loading and querying data including PolyBase and labeling queries for monitoring. The presentation also covers tuning aspects like statistics, indexing, and resource classes.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
The document discusses migrating a data warehouse to the Databricks Lakehouse Platform. It outlines why legacy data warehouses are struggling, how the Databricks Platform addresses these issues, and key considerations for modern analytics and data warehousing. The document then provides an overview of the migration methodology, approach, strategies, and key takeaways for moving to a lakehouse on Databricks.
This document provides an overview of microservice architecture, including its key characteristics, benefits, problems, and solutions. Microservices are small, independent services that are organized around business capabilities. They communicate through APIs and can be developed and deployed independently. Benefits include scalability, flexibility, and ease of development and testing. Challenges include configuring and monitoring distributed services. Common solutions involve service discovery, load balancing, centralized logging/monitoring, and externalizing configuration. The document also discusses architectural patterns, anti-patterns, and references further resources on microservices.
Azure Service Fabric is now Generally Available! In this meetup we will start from the beginning and define what is microservice. Next we will have a deep dive in Azure Service Fabric. Azure Service Fabric is one of the most interesting Azure service. Used internally in Microsoft for 5 years and backing up one of the most demanding Azure services today such as Azure SQL, Document DB, Cortana and Skype for Business. We will be talking about the two models that are supported by Azure Service Fabric: - Reliable Services (We will explore the reasons for having both stateful and stateless offerings in this model) - Reliable Actors Then we will talk how you can create Azure Service Fabric cluster on premise or in another cloud. We will demo deployments in Azure for the various models. Azure Service Fabric is the most advanced and complete offering for developing and hosting microservices in Azure. It builds on years experience Microsoft acquired running one of the most demanding services such as Azure SQL. Moreover, Azure Service Fabric solves very difficult distributed computing problems such as data synchronization, zero downtime deployment, update and rollback operations at large scale. Join us to learn more about Azure Service Fabric and start using it immediately after the meetup!
For enterprises trying to stay ahead of the game, having a robust and fast application development program can make or break their market presence. The challenge for developers, however, is to build responsive, devise-agnostic applications in days, not months.
The introduction covers the following 1. What are Microservices and why should be use this paradigm? 2. 12 factor apps and how Microservices make it easier to create them 3. Characteristics of Microservices Note: Please download the slides to view animations.