Microservices architecture is a very powerful way to build scalable systems optimized for speed of change. To do this, we need to build independent, autonomous services which by definition tend to minimize dependencies on other systems. One of the tenants of microservices, and a way to minimize dependencies, is “a service should own its own database”. Unfortunately this is a lot easier said than done. Why? Because: your data.
We’ve been dealing with data in information systems for 5 decades so isn’t this a solved problem? Yes and no. A lot of the lessons learned are still very relevant. Traditionally, we application developers have accepted the practice of using relational databases and relying on all of their safety guarantees without question. But as we build services architectures that span more than one database (by design, as with microservices), things get harder. If data about a customer changes in one database, how do we reconcile that with other databases (especially where the data storage may be heterogenous?).
For developers focused on the traditional enterprise, not only do we have to try to build fast-changing systems that are surrounded by legacy systems, the domains (finance, insurance, retail, etc) are incredibly complicated. Just copying with Netflix does for microservices may or may not be useful. So how do we develop and reason about the boundaries in our system to reduce complexity in the domain?
In this talk, we’ll explore these problems and see how Domain Driven Design helps grapple with the domain complexity. We’ll see how DDD concepts like Entities and Aggregates help reason about boundaries based on use cases and how transactions are affected. Once we can identify our transactional boundaries we can more carefully adjust our needs from the CAP theorem to scale out and achieve truly autonomous systems with strictly ordered eventual consistency. We’ll see how technologies like Apache Kafka, Apache Camel and Debezium.io can help build the backbone for these types of systems. We’ll even explore the details of a working example that brings all of this together.
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
Cloud storage is one of the primary service offered by almost all the leading cloud service providers. This presentation looks into the options of Cloud storage in Azure, AWS and Google Cloud platform.
Colombo Cloud User Meetup
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
In this session, we introduce AWS Glue, provide an overview of its components, and share how you can use AWS Glue to automate discovering your data, cataloging it, and preparing it for analysis.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
Cloud storage is one of the primary service offered by almost all the leading cloud service providers. This presentation looks into the options of Cloud storage in Azure, AWS and Google Cloud platform.
Colombo Cloud User Meetup
A brief overview of caching mechanisms in a web application. Taking a look at the different layers of caching and how to utilize them in a PHP code base. We also compare Redis and MemCached discussing their advantages and disadvantages.
Fuzzy Matching on Apache Spark with Jennifer ShinDatabricks
This document provides an overview of fuzzy matching techniques for surveys. It begins with an introduction to fuzzy matching and edit distances. A use case of applying fuzzy matching to label thousands of survey questions is described. Different approaches for fuzzy matching labels are explored, including a word-based comparison model and cell-based comparison model using Levenshtein distance. Implementation considerations for fuzzy matching like data suitability, validation methodology, and computing resources are also discussed. Code in Python for calculating Levenshtein distance is provided.
This document discusses the evolution of data warehousing and the modern data platform. It outlines some common problems with traditional data warehousing approaches like long setup times, poor performance and scalability issues. The modern data platform combines cloud-based data warehousing, data modeling principles, and data warehouse automation tools to provide highly scalable and agile solutions. Key components demonstrated are the Snowflake data platform for scalable data storage and processing, Fivetran for automated data integration, and capabilities like cloning data for testing and time travel to access historical data.
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
This document summarizes a presentation on structured, unstructured, and streaming big data on the Amazon Web Services platform. The agenda includes an introduction, overview of structured/unstructured/streaming data on AWS, and building an Amazon Redshift data warehouse. The presentation discusses ingesting, storing, processing, and analyzing various types of data on AWS services like Amazon Kinesis, S3, Redshift, EMR, and Machine Learning. It provides comparisons of databases and use cases like real-time analytics. A demo of real-time Twitter analytics using Kinesis, Lambda, and open source software is also noted.
The document discusses various disaster recovery scenarios for a BI solution involving Azure Synapse, Data Lake, and Data Share. Scenario 2 involves provisioning these services in a paired secondary region, then synchronizing the Data Lake, restoring the SQL Pool, activating Synapse pipelines, and data share triggers to enable a standby environment. A step-by-step guide is provided for implementing scenario 2 with phases for provisioning, synchronization, restore, activation of pipelines and triggers, and notification of consumers. References are also included.
Microsoft Data Platform - What's includedJames Serra
This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
What's New in Amazon RDS for Open-Source & Commercial DatabasesAmazon Web Services
This document provides an overview of Amazon Relational Database Service (RDS). RDS offers fully managed relational databases in the cloud to reduce effort, risk, and cost compared to hosting databases on-premises or unmanaged instances. It supports multiple commercial database engines including MySQL, PostgreSQL, MariaDB, Oracle, SQL Server and the proprietary Amazon Aurora. Key features of RDS include high availability, automatic backups, easy scaling, encryption, integration with other AWS services, and compliance with industry standards. Recent updates include enhanced OS monitoring, performance insights previews, and new database engines and features.
This document provides information about Amazon QuickSight, a fully managed cloud business intelligence system. It discusses how QuickSight allows users to connect to data sources, create interactive dashboards, and publish them for sharing. QuickSight is serverless, scalable from 10 to 10,000 users, and uses a pay-per-session pricing model where users only pay when accessing dashboards.
This document provides an introduction and overview of Azure Data Lake. It describes Azure Data Lake as a single store of all data ranging from raw to processed that can be used for reporting, analytics and machine learning. It discusses key Azure Data Lake components like Data Lake Store, Data Lake Analytics, HDInsight and the U-SQL language. It compares Data Lakes to data warehouses and explains how Azure Data Lake Store, Analytics and U-SQL process and transform data at scale.
Apache Cassandra is a free, distributed, open source, and highly scalable NoSQL database that is designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability, and tunable consistency. Cassandra's architecture allows it to spread data across a cluster of servers and replicate across multiple data centers for fault tolerance. It is used by many large companies for applications that require high performance, scalability, and availability.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Best Practices for Running Microsoft SQL Server on AWSGianluca Hotz
The document discusses best practices for running Microsoft SQL Server on AWS. It provides an overview of options for deploying SQL Server on AWS, including using Amazon RDS or Amazon EC2. When using RDS, AWS manages the SQL Server instance and provides features like automated backups and read replicas. When using EC2, the user has more control but must manage SQL Server, backups, and high availability. The document discusses considerations and techniques for optimizing SQL Server performance on EC2, including storage options and configuration.
This document discusses Redis, MongoDB, and Amazon DynamoDB. It begins with an overview of NoSQL databases and the differences between SQL and NoSQL databases. It then covers Redis data types like strings, hashes, lists, sets, sorted sets, and streams. Examples use cases for Redis are also provided like leaderboards, geospatial queries, and message queues. The document also discusses MongoDB design patterns like embedding data, embracing duplication, and relationships. Finally, it provides a high-level overview of DynamoDB concepts like tables, items, attributes, and primary keys.
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Amazon SimpleDB is a hosted database service that allows developers to store and query structured data via web services requests without needing to worry about data modeling, index maintenance, or performance tuning. It is a flexible, scalable, and inexpensive key-value store that automatically indexes data and enables real-time lookups and simple queries without complexity. Netflix migrated parts of its database from Oracle to Amazon SimpleDB to take advantage of its high availability, flexibility, and cost effectiveness compared to running its own data centers.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...Amazon Web Services
Building powerful web applications in the AWS Cloud : A Love Story, Design patterns in web-based cloud architecture, Jinesh Varia gave this talk at Cloud Connect and several other places
http://aws.typepad.com/aws/2011/03/building-powerful-web-applications-in-the-aws-cloud-a-love-story.html
A presentation on why or why not microservices, why a platform is important, discovering how to break down a monolith and some of the challenges you'll face (data, transactions, boundaries, etc). Last section is on Istio and service mesh introductions. Follow on twitter @christianposta for updates and more details
Microservices with Apache Camel, Docker and Fabric8 v2Christian Posta
My talk from Red Hat Summit 2015 about the pros/cons of microservices, how integration is a strong requirement for doing distributed systems designs, and how open source projects like Apache Camel, Docker, Kubernetes, OpenShift and Fabric8 can help simplify and manage microservice environments
A brief overview of caching mechanisms in a web application. Taking a look at the different layers of caching and how to utilize them in a PHP code base. We also compare Redis and MemCached discussing their advantages and disadvantages.
Fuzzy Matching on Apache Spark with Jennifer ShinDatabricks
This document provides an overview of fuzzy matching techniques for surveys. It begins with an introduction to fuzzy matching and edit distances. A use case of applying fuzzy matching to label thousands of survey questions is described. Different approaches for fuzzy matching labels are explored, including a word-based comparison model and cell-based comparison model using Levenshtein distance. Implementation considerations for fuzzy matching like data suitability, validation methodology, and computing resources are also discussed. Code in Python for calculating Levenshtein distance is provided.
This document discusses the evolution of data warehousing and the modern data platform. It outlines some common problems with traditional data warehousing approaches like long setup times, poor performance and scalability issues. The modern data platform combines cloud-based data warehousing, data modeling principles, and data warehouse automation tools to provide highly scalable and agile solutions. Key components demonstrated are the Snowflake data platform for scalable data storage and processing, Fivetran for automated data integration, and capabilities like cloning data for testing and time travel to access historical data.
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
This document summarizes a presentation on structured, unstructured, and streaming big data on the Amazon Web Services platform. The agenda includes an introduction, overview of structured/unstructured/streaming data on AWS, and building an Amazon Redshift data warehouse. The presentation discusses ingesting, storing, processing, and analyzing various types of data on AWS services like Amazon Kinesis, S3, Redshift, EMR, and Machine Learning. It provides comparisons of databases and use cases like real-time analytics. A demo of real-time Twitter analytics using Kinesis, Lambda, and open source software is also noted.
The document discusses various disaster recovery scenarios for a BI solution involving Azure Synapse, Data Lake, and Data Share. Scenario 2 involves provisioning these services in a paired secondary region, then synchronizing the Data Lake, restoring the SQL Pool, activating Synapse pipelines, and data share triggers to enable a standby environment. A step-by-step guide is provided for implementing scenario 2 with phases for provisioning, synchronization, restore, activation of pipelines and triggers, and notification of consumers. References are also included.
Microsoft Data Platform - What's includedJames Serra
This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
What's New in Amazon RDS for Open-Source & Commercial DatabasesAmazon Web Services
This document provides an overview of Amazon Relational Database Service (RDS). RDS offers fully managed relational databases in the cloud to reduce effort, risk, and cost compared to hosting databases on-premises or unmanaged instances. It supports multiple commercial database engines including MySQL, PostgreSQL, MariaDB, Oracle, SQL Server and the proprietary Amazon Aurora. Key features of RDS include high availability, automatic backups, easy scaling, encryption, integration with other AWS services, and compliance with industry standards. Recent updates include enhanced OS monitoring, performance insights previews, and new database engines and features.
This document provides information about Amazon QuickSight, a fully managed cloud business intelligence system. It discusses how QuickSight allows users to connect to data sources, create interactive dashboards, and publish them for sharing. QuickSight is serverless, scalable from 10 to 10,000 users, and uses a pay-per-session pricing model where users only pay when accessing dashboards.
This document provides an introduction and overview of Azure Data Lake. It describes Azure Data Lake as a single store of all data ranging from raw to processed that can be used for reporting, analytics and machine learning. It discusses key Azure Data Lake components like Data Lake Store, Data Lake Analytics, HDInsight and the U-SQL language. It compares Data Lakes to data warehouses and explains how Azure Data Lake Store, Analytics and U-SQL process and transform data at scale.
Apache Cassandra is a free, distributed, open source, and highly scalable NoSQL database that is designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability, and tunable consistency. Cassandra's architecture allows it to spread data across a cluster of servers and replicate across multiple data centers for fault tolerance. It is used by many large companies for applications that require high performance, scalability, and availability.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Best Practices for Running Microsoft SQL Server on AWSGianluca Hotz
The document discusses best practices for running Microsoft SQL Server on AWS. It provides an overview of options for deploying SQL Server on AWS, including using Amazon RDS or Amazon EC2. When using RDS, AWS manages the SQL Server instance and provides features like automated backups and read replicas. When using EC2, the user has more control but must manage SQL Server, backups, and high availability. The document discusses considerations and techniques for optimizing SQL Server performance on EC2, including storage options and configuration.
This document discusses Redis, MongoDB, and Amazon DynamoDB. It begins with an overview of NoSQL databases and the differences between SQL and NoSQL databases. It then covers Redis data types like strings, hashes, lists, sets, sorted sets, and streams. Examples use cases for Redis are also provided like leaderboards, geospatial queries, and message queues. The document also discusses MongoDB design patterns like embedding data, embracing duplication, and relationships. Finally, it provides a high-level overview of DynamoDB concepts like tables, items, attributes, and primary keys.
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Amazon SimpleDB is a hosted database service that allows developers to store and query structured data via web services requests without needing to worry about data modeling, index maintenance, or performance tuning. It is a flexible, scalable, and inexpensive key-value store that automatically indexes data and enables real-time lookups and simple queries without complexity. Netflix migrated parts of its database from Oracle to Amazon SimpleDB to take advantage of its high availability, flexibility, and cost effectiveness compared to running its own data centers.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...Amazon Web Services
Building powerful web applications in the AWS Cloud : A Love Story, Design patterns in web-based cloud architecture, Jinesh Varia gave this talk at Cloud Connect and several other places
http://aws.typepad.com/aws/2011/03/building-powerful-web-applications-in-the-aws-cloud-a-love-story.html
A presentation on why or why not microservices, why a platform is important, discovering how to break down a monolith and some of the challenges you'll face (data, transactions, boundaries, etc). Last section is on Istio and service mesh introductions. Follow on twitter @christianposta for updates and more details
Microservices with Apache Camel, Docker and Fabric8 v2Christian Posta
My talk from Red Hat Summit 2015 about the pros/cons of microservices, how integration is a strong requirement for doing distributed systems designs, and how open source projects like Apache Camel, Docker, Kubernetes, OpenShift and Fabric8 can help simplify and manage microservice environments
Exploring Twitter's Finagle technology stack for microservices💡 Tomasz Kogut
This document summarizes a presentation about Finagle, Twitter's microservices technology stack. It discusses how Finagle addresses challenges with microservices like service discovery, load balancing, and request tracing across services. It presents Finagle's core abstractions like Futures, Services, and Filters. Services represent both clients and servers, and Filters can add functionality like retries and timeouts. The document also mentions Twitter Server, a framework for building Finagle-based servers that handles flags, logging, metrics and admin interfaces. Finally, it briefly introduces Finatra, which builds on Finagle and Twitter Server and adds features like dependency injection and routing.
Advanced web application architecture Way2WebMatthias Noback
How to:
- Design a clean domain model
- Model your application's use cases as application services
- Connect those well-designed layers to the world outside
Protecting your high quality domain model can be accomplished by applying a so-called ports & adapters or hexagonal architecture.
Some of the keywords for this talk: aggregate design, domain events, application services, commands, queries and events, layered architecture, ports & adapters, hexagonal architecture.
We consider a microservices architecture to achieve an end goal, not because it's "the cool thing to do". Every organization looking to adopt this architecture must realize (and adhere) to a set of foundational principles. Guided by those principles, we can correctly choose the technology to help support a microservices architecture and meet our end goals. This talk explains those core principles and gives you the tools needed for your microservices journey.
This document discusses web application architecture and frameworks. It argues that frameworks should not dictate project structure, and that the code should separate domain logic from infrastructure logic. This allows focusing on the core problem domain without concerning itself with technical details like databases or web requests. It also advocates splitting code into ports that define intentions like persistence, and adapters that provide framework-specific implementations, allowing for independence of the domain logic from any particular framework or technology. This architecture, known as hexagonal or ports and adapters, facilitates testing, replacement of parts, and future-proofing of the application.
The Hardest Part of Microservices: Calling Your ServicesChristian Posta
When building microservices, you must solve for a number of critical functions, but the process can be incredibly complex and expensive to maintain. Christian Posta offers an overview of Envoy Proxy and Istio.io Service Mesh, explaining how they solve application networking problems more elegantly by pushing these concerns down to the infrastructure layer and demonstrating how it all works.
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, EuropeFlip Kromer
This talk centers on two things: a set of patterns for the architecture of high-scale data systems; and a framework for understanding the tradeoffs we make in designing them.
Service-mesh technology promises to deliver a lot of value to a cloud-native application, but it doesn't come without some hype. In this talk, we'll look at what is a "service mesh", how it compares to similar technology (Netflix OSS, API Management, ESBs, etc) and what options for service mesh exist today.
DDD, CQRS and testing with ASP.Net MVCAndy Butland
This document provides an overview of a presentation on Domain Driven Design (DDD), Command Query Responsibility Segregation (CQRS), and testing with ASP.Net MVC. It introduces the presenter and gives an outline of the topics to be covered, including implementing DDD with ASP.Net MVC and Entity Framework, using a mediator pattern for CQRS, and unit testing models, queries, and commands. References are given to other authors and resources that influenced the approaches and implementations discussed in the presentation.
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
Cloud native orchestrators like AWS Step Functions and Amazon SageMaker Pipelines can be used to construct scalable end-to-end deep learning pipelines in the cloud. These orchestrators provide centralized monitoring, logging, and scaling capabilities. AWS Step Functions is useful for integrating pipelines with production infrastructure, while SageMaker Pipelines is good for research workflows that require validation. Serverless architectures using services like AWS Lambda, Batch, and Fargate can build scalable and flexible pipelines at a low cost.
Microservices for java architects it-symposium-2015-09-15Derek Ashmore
This document provides an overview of microservices for Java architects by Derek Ashmore. It begins by introducing Ashmore and his background. The document then discusses what microservices are, how they differ from traditional monolithic architectures, and considerations for designing microservices like service boundaries, handling failures, ensuring data integrity and performance. It also covers packaging and deployment options for microservices like Spring Boot and Docker. Finally, it addresses some common misconceptions about microservices and provides additional resources for further reading.
DISQUS is a comment system that handles high volumes of traffic, with up to 17,000 requests per second and 250 million monthly visitors. They face challenges in unpredictable spikes in traffic and ensuring high availability. Their architecture includes over 100 servers split between web servers, databases, caching, and load balancing. They employ techniques like vertical and horizontal data partitioning, atomic updates, delayed signals, consistent caching, and feature flags to scale their large Django application.
Service-mesh technology promises to deliver a lot of value to a cloud-native application, but it doesn't come without some hype. In this talk, we'll look at what is a "service mesh", how it compares to similar technology (Netflix OSS, API Management, ESBs, etc) and what options for service mesh exist today.
One of the goals of Grails 3 is to reach out of the servlet container. Grails 3 has a concept of application profiles for choosing a certain set of core plugins to use. In this talk Lari will present how Ratpack fits in Grails 3. He will also talk about how Grails 3 supports micro service architectures.
One of the goals of Grails 3 is to reach out of the servlet container. Grails 3 has a concept of application profiles for choosing a certain set of core plugins to use. In this talk Lari will present how Ratpack fits in Grails 3. He will also talk about how Grails 3 supports micro service architectures.
Apache Beam is a unified programming model for batch and streaming data processing. It defines concepts for describing what computations to perform (the transformations), where the data is located in time (windowing), when to emit results (triggering), and how to accumulate results over time (accumulation mode). Beam aims to provide portable pipelines across multiple execution engines, including Apache Flink, Apache Spark, and Google Cloud Dataflow. The talk will cover the key concepts of the Beam model and how it provides unified, efficient, and portable data processing pipelines.
Move Auth, Policy, and Resilience to the PlatformChristian Posta
Developer's time is the most crucial resource in an enterprise IT organization. Too much time is spent on undifferentiated heavy lifting and in the world of APIs and microservices much of that is spent on non-functional, cross-cutting networking requirements like security, observability, and resilience.
As organizations reconcile their DevOps practices into Platform Engineering, tools like Istio help alleviate developer pain. In this talk we dig into what that pain looks like, how much it costs, and how Istio has solved these concerns by examining three real-life use cases. As this space continues to emerge, and innovation has not slowed, we will also discuss the recently announced Istio sidecar-less mode which significantly reduces the hurdles to adopt Istio within Kubernetes or outside Kubernetes.
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
Service mesh is a powerful pattern for implementing strong zero-trust networking practices, introducing better network observability, and allowing for more fine-grained traffic control. Up until now, the sidecar pattern was used to implement service-mesh capability but as the technology matures, a new pattern has emerged: sidecarless service mesh. Two prominent open-source networking projects, Cilium and Istio, have implemented a sidecar-free approach to service mesh but they both make interesting design decisions and tradeoffs. In this talk we review the architecture of both, focusing on the pros and cons of implementations such as mutual authentication, ingress, and observability.
Understanding Wireguard, TLS and Workload IdentityChristian Posta
Zero Trust Networking has become a standard marketing buzzword but the underlying principles are critical for modern microservice-style architectures. Authentication, authorizations, policy, etc. can be difficult to implement between services and do so in a maintainable way. Google invented their own transparent encryption and authorization protocol called "ALTS" back in 2007 to serve the application layer of Google's Borg workload scheduler, but we don't see others using it outside Google.
In this webinar we look at existing technology like TLS and newcomer Wireguard and see how these technologies come together to provide a secure foundation for workload identity and modern service-to-service networking.
Istio ambient mesh uses a sidecar-less data plane that focuses on ease of operations, incremental adoption, and separation of security boundaries for applications and mesh infrastructure.
In this webinar, we'll explore:
- The forces of modernization and compliance pressures,
- How Zero Trust Architecture (ZTA) can help, and
- How Istio ambient mesh lowers the barrier for establishing the properties necessary to achieve Zero Trust and compliance
The document discusses Cilium and Istio with Gloo Mesh. It provides an overview of Gloo Mesh, an enterprise service mesh for multi-cluster, cross-cluster and hybrid environments based on upstream Istio. Gloo Mesh focuses on ease of use, powerful best practices built in, security, and extensibility. It allows for consistent API for multi-cluster north-south and east-west policy, team tenancy with service mesh as a service, and driving everything through GitOps.
This document discusses service mesh patterns for connecting microservices across multiple clusters. It describes using Envoy proxy to provide service discovery, load balancing, security and resiliency. Patterns are presented for connecting services across clusters with flat, controlled or separate networks. Managing connectivity across clusters can increase operator burden. Gloo Mesh is presented as a way to simplify management across multiple clusters with a centralized control plane.
Multicluster Kubernetes and Service Mesh PatternsChristian Posta
Building applications for cloud-native infrastructure that are resilient, scalable, secure, and meet compliance and IT objectives gets complicated. Another wrinkle for the organizations with which we work is the fact they need to run across a hybrid deployment footprint, not just Kubernetes. At Solo.io, we build application networking technology on Envoy Proxy that helps solve difficult multi-deployment, multi-cluster, and even multi-mesh problems.
In this webinar, we’re going to explore different options and patterns for building secure, scalable, resilient applications using technology like Kubernetes and Service Mesh without leaving behind existing IT investments. We’ll see why and when to use multi-cluster topologies, how to build for high availability and team autonomy, and solve for things like service discovery, identity federation, traffic routing, and access control.
Cloud-Native Application Debugging with Envoy and Service MeshChristian Posta
Microservices have been great for accelerating the software innovation and delivery, but they also present new challenges, especially as abstractions and automated orchestration at every layer make pinpointing the issue seem like walking around a maze with a blindfold. Existing tools weren’t designed for distributed environments, and the new tools need to consider how to leverage these abstraction layers to better observe, test, and troubleshoot issues.
Christian Posta walks you through Envoy Proxy and service mesh architecture for L7 data plane, the key features in Envoy that can help in debugging and troubleshooting, chaos engineering as a testing methodology for microservices, how to approach a testing and debugging framework for microservices, and new open source tools that address these areas. You’ll explore a workflow to discover and resolve microservices issues, including injecting experiments for stress testing the applications, gathering requests in flight, recording and replaying them, and debugging them step by step without affecting production traffic.
Kubernetes Ingress to Service Mesh (and beyond!)Christian Posta
Kubernetes users need to allow traffic to flow into and within the cluster. Treating the application traffic separately from the business logic allows presents new possibilities in how service to service traffic is served, controlled and observed — and provides a transition to intra cluster networking like Service Mesh. With microservices, there is a concept of both North / South traffic (incoming requests from end users to the cluster) and East / West (intra cluster) communication between the services. In this talk we will explain how Envoy Proxy works in Kubernetes as a proxy for both of these traffic directions and how it can be leveraged to do things like traffic shaping, security, and integrate the north/south to east/west behavior.
Christian Posta (@christianposta) is Global Field CTO at Solo.io, former Chief Architect at Red Hat, and well known in the community for being an author (Istio in Action, Manning, Istio Service Mesh, O'Reilly 2018, Microservices for Java Developers, O’Reilly 2016), frequent blogger, speaker, open-source enthusiast and committer on various open-source projects including Istio, Kubernetes, and many others. Christian has spent time at both enterprises as well as web-scale companies and now helps companies create and deploy large-scale, cloud-native resilient, distributed architectures. He enjoys mentoring, training and leading teams to be successful with distributed systems concepts, microservices, devops, and cloud-native application design.
The exploration of service mesh for any organization comes with some serious questions. What data plane should I use? How does this tie in with my existing API infrastructure? What kind of overhead do sidecar proxies demand? As I've seen in my work with various organizations over the years "if you have a successful microservices deployment, then you have a service mesh whether it’s explicitly optimized as one or not."
In this talk, we seek to understand the role of the data plane and how to pick the right component for the problem context. We start off by establishing the spectrum of data-plane components from shared gateways to in-code libraries with service proxies being along that spectrum. We clearly identify which scenarios would benefit from which part of the data-plane spectrum and show how modern service meshes including Istio, Linkerd, and Consul enable these optimizations.
Deep Dive: Building external auth plugins for Gloo EnterpriseChristian Posta
Using the plugin framework for Ext. Auth Service in Gloo Enterprise, we can build any custom AuthN/AuthZ plugins to handle security requirements not provided out of the box.
Role of edge gateways in relation to service mesh adoptionChristian Posta
API Gateways provide functionality like rate limiting, authentication, request routing, reporting, and more. If you’ve been following the rise in service-mesh technologies, you’ll notice there is a lot of overlap with API Gateways when solving some of the challenges of microservices. If service mesh can solve these same problems, you may wonder whether you really need a dedicated API Gateway solution?
The reality is there is some nuance in the problems solved at the edge (API Gateway) compared to service-to-service communication (service mesh) within a cluster. But with the evolution of cluster-deployment patterns, these nuances are becoming less important. What’s more important is that the API Gateway is evolving to live at a layer above service mesh and not directly overlapping with it. In other words, API Gateways are evolving to solve application-level concerns like aggregation, transformation, and deeper context and content-based routing as well as fitting into a more self-service, GitOps style workflow.
In this talk we put aside the “API Gateway” infrastructure as we know it today and go back to first principles with the “API Gateway pattern” and revisit the real problems we’re trying to solve. Then we’ll discuss pros and cons of alternative ways to implement the API Gateway pattern and finally look at open source projects like Envoy, Kubernetes, and GraphQL to see how the “API Gateway pattern” actually becomes the API for our applications while coexisting nicely with a service mesh (if you adopt a service mesh).
Navigating the service mesh landscape with Istio, Consul Connect, and LinkerdChristian Posta
The document discusses various service mesh options including Linkerd, Consul Connect, Istio, and AWS App Mesh. It provides an overview of each solution, describing their key features and strengths/opportunities. It emphasizes that the service mesh approach is useful for managing inter-service communication and that implementations are still evolving. It recommends starting simply and iteratively adopting capabilities to match needs.
Distributed microservices introduce new challenges: failure modes are harder to anticipate and resolve. In this session, we present a “Chaos Debugging” framework enabled by three open source projects: Gloo Shot, Squash, and Loop to help you increase your microservices’ “immunity” to issues.
Gloo Shot integrates with any service mesh to implement advanced, realistic chaos experiments. Squash connects powerful and mature debuggers (gdb, dlv, java debugging) to your microservices while they run in Kubernetes. Loop extends the capability of your service mesh to observe your application and record full transactions for sandboxed replay and debugging.
Come to this demo-heavy talk to see how together, Squash, Gloo Shot, and Loop allow you to trigger, replay, and investigate failure modes of your microservices in a language agnostic and efficient manner without requiring any changes to your code.
Leveraging Envoy Proxy and GraphQL to Lower the Risk of Monolith to Microserv...Christian Posta
If you have an existing Java monolith, you know you must take care making changes to it or altering it in any negative way. Often times these monoliths are very valuable to the business and generate a lot of revenue. At the same time, since it’s difficult to make changes to the monolith it’s desirable to move to a microservices architecture. Unfortunately you cannot just do a big-bang migration to a greenfield architecture and will have to incrementally adopt microservices. In this talk, we’ll look at using Gloo proxy which is based on Envoy Proxy and GraphQL to do surgical, function-level traffic control and API aggregation to safely migrate your monolith to microservices and serverless functions.
Service-mesh options with Linkerd, Consul, Istio and AWS AppMeshChristian Posta
Service mesh abstracts the network from developers to solve three main pain points:
How do services communicate securely with one another
How can services implement network resilience
When things go wrong, can we identify what and why
Service mesh implementations usually follow a similar architecture: traffic flows through control points between services (usually service proxies deployed as sidecar processes) while an out-of-band set of nodes is responsible for defining the behavior and management of the control points. This loosely breaks out into an architecture of a "data plane" through which requests flow and a "control plane" for managing a service mesh.
Different service mesh implementations use different data planes depending on their use cases and familiarity with particular technology. The control plane implementations vary between service-mesh implementations as well. In this talk, we'll take a look at three different control plane implementations with Istio, Linkerd and Consul, their strengths, and their specific tradeoffs to see how they chose to solve each of the three pain points from above. We can use this information to make choices about a service mesh or to inform our journey if we choose to build a control plane ourselves.
The document summarizes the new features of Istio 1.1, an open-source service mesh. Some key highlights include improved performance and scalability, namespace isolation, multi-cluster capabilities, easier installation with Helm, and locality-aware load balancing. A new Sidecar resource was introduced to improve performance by configuring resources for individual proxies. The presentation demonstrates performance improvements with the Sidecar resource and highlights additional functionality in Istio like traffic control and metrics collection.
API Gateways are going through an identity crisisChristian Posta
API Gateways provide functionality like rate limiting, authentication, request routing, reporting, and more. If you've been following the rise in service-mesh technologies, you'll notice there is a lot of overlap with API Gateways when solving some of the challenges of microservices. If service mesh can solve these same problems, you may wonder whether you really need a dedicated API Gateway solution?
The reality is there is some nuance in the problems solved at the edge (API Gateway) compared to service-to-service communication (service mesh) within a cluster. But with the evolution of cluster-deployment patterns, these nuances are becoming less important. What's more important is that the API Gateway is evolving to live at a layer above service mesh and not directly overlapping with it. In other words, API Gateways are evolving to solve application-level concerns like aggregation, transformation, and deeper context and content-based routing as well as fitting into a more self-service, GitOps style workflow.
In this talk we put aside the "API Gateway" infrastructure as we know it today and go back to first principles with the "API Gateway pattern" and revisit the real problems we're trying to solve. Then we'll discuss pros and cons of alternative ways to implement the API Gateway pattern and finally look at open source projects like Envoy, Kubernetes, and GraphQL to see how the "API Gateway pattern" actually becomes the API for our applications while coexisting nicely with a service mesh (if you adopt a service mesh).
KubeCon NA 2018: Evolution of Integration and Microservices with Service Mesh...Christian Posta
Cloud-native describes a way of building applications on a cloud platform to iteratively discover and deliver business value. We now have access to a lot of similar technology that the large internet companies pioneered and used to their advantage to dominate their respective markets. What challenges arise when we start building applications to take advantage of this new technology?
In this talk we'll explore the role of service meshes when building distributed systems, why they make sense, and where they don't make sense. We will look at a class of problem that crops up that service mesh cannot solve, but that frameworks and even new programming languages like Ballerina are aiming to solve
Knative builds on Kubernetes and Istio to provide "PaaS-like abstractions" that raise the level of abstraction for specifying, running, and modifying applications. Knative includes building blocks like Knative Serving for autoscaling container workloads to zero, Knative Eventing for composing event-driven services, Knative Build for building containers from source, and Knative Pipelines for abstracting CI/CD pipelines. While Knative can run any type of container, its building blocks help enable serverless-style functions by allowing compute resources to scale to zero and be driven by event loads.
Building Scaleable Serverless Event-Driven Computing with AWS Lambda powered ...Ortus Solutions, Corp
Explore how to build scalable, serverless event-driven applications using AWS Lambda powered by BoxLang. This session dives into leveraging Lambda's capabilities to handle event-driven computing efficiently. Whether new to serverless architecture or looking to enhance your skills, join us to learn practical insights and techniques for optimizing application performance and scalability.
How to debug ColdFusion Applications using “ColdFusion Builder extension for ...Ortus Solutions, Corp
Unlock the secrets of seamless ColdFusion error troubleshooting! Join us to explore the potent capabilities of Visual Studio Code (VS Code) and ColdFusion Builder (CF Builder) in debugging. This hands-on session guides you through practical techniques tailored for local setups, ensuring a smooth and efficient development experience.
Lots of bloggers are using Google AdSense now. It’s getting really popular. With AdSense, bloggers can make money by showing ads on their websites. Read this important article written by the experienced designers of the best website designing company in Delhi –
Ansys Mechanical enables you to solve complex structural engineering problems and make better, faster design decisions. With the finite element analysis (FEA) solvers available in the suite, you can customize and automate solutions for your structural mechanics problems and parameterize them to analyze multiple design scenarios. Ansys Mechanical is a dynamic tool that has a complete range of analysis tools.
This workshop focused on simplifying programming decisions with key coding principles. Participants learned to avoid clever code, apply inversion of control, prefer composition over inheritance, write self-documenting code, use encapsulation, reduce nesting, avoid reassignment, and implement guard statements, with live refactoring of code examples.
CommandBox was highlighted as a powerful web hosting solution, perfect for developers and businesses alike. Featuring a built-in server and command-line interface, CommandBox simplified web application management. Developers could deploy multiple application instances simultaneously, optimizing development workflows. CommandBox's efficient deployment processes ensured reliable web hosting, seamlessly integrating into existing workflows for scalability and feature enhancements.
Building on his 2021 ITB presentation, "Monitoring Solutions for CF and Lucee," Charlie now focuses on practical demonstrations of these tools. Discover key observations and metrics for troubleshooting, tuning, and receiving alerts. Gain insights into the evolution of these tools since the last talk, drawn from Charlie's extensive experience assisting users with server, container, and CommandBox environments.
Join me for an insightful journey into task scheduling within the ColdBox framework. In this session, we explored how to effortlessly create and manage scheduled tasks directly in your code, enhancing control and efficiency in applications and modules. Attendees experienced a user-friendly dashboard for seamless task management and monitoring. Whether you're experienced with ColdBox or new to it, this session provided practical knowledge and tips to streamline your development workflow.
Major Outages in Major Enterprises Payara ConferenceTier1 app
In this session, we will be discussing major outages that happened in major enterprises. We will analyse the actual thread dumps, heap dumps, GC logs, and other artifacts captured at the time of the problem. After this session, troubleshooting CPU spikes, OutOfMemoryError, response time degradations, network connectivity issues, and application unresponsiveness may not stump you.
Discover Passkeys, the next evolution in secure login methods that eliminate traditional password vulnerabilities. Learn about the CBSecurity Passkeys module's installation, configuration, and integration into your application to enhance security.
Sami provided a beginner-friendly introduction to Amazon Web Services (AWS), covering essential terms, products, and services for cloud deployment. Participants explored AWS' latest Gen AI offerings, making it accessible for those starting their cloud journey or integrating AI into coding practices.
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio, Inc.
Alluxio Webinar
June. 18, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jianjian Xie (Staff Software Engineer, Alluxio)
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
Austere Systems Company Portfolio (ASPL).pdfsupport433113
Austere Systems Pvt. Ltd. is a leading IT services provider specializing in a wide range of technology solutions. We help businesses leverage the power of IT to achieve their strategic goals and gain a competitive edge.
Our Expertise:
IT Staff Augmentation: We provide skilled and experienced IT professionals across various domains like SAP, Java, .Net, PHP and PowerBi.
Application Development: Our team builds robust mobile, Web and desktop applications to meet your specific business needs.
Product Re-engineering & Maintenance: We breathe new life into existing software and ensure its smooth operation.
Infrastructure Management: We take care of your IT infrastructure, including servers, networks, and security.
Support Services: Our L1/L2 support centers offer prompt and reliable assistance for your IT issues.
Digital Marketing & SEO: We help you reach your target audience and boost your online presence.
Why Choose Austere Systems?
Skilled & Experienced Professionals: Our team possesses in-depth knowledge and expertise in various technologies.
Focus on Client Satisfaction: We prioritize building strong relationships and exceeding client expectations.
Innovative Solutions: We deliver cutting-edge solutions tailored to your unique business challenges.
Cost-Effective Services: We offer competitive rates and ensure value for your investment.
In this session, we explored how the cbfs module empowers developers to abstract and manage file systems seamlessly across their lifecycle. From local development to S3 deployment and customized media providers requiring authentication, cbfs offers flexible solutions. We discussed how cbfs simplifies file handling with enhanced workflow efficiency compared to native methods, along with practical tips to accelerate complex file operations in your projects.
LIVE DEMO: CCX for CSPs, a drop-in DBaaS solutionSeveralnines
This webinar aims to equip Cloud Service Providers (CSPs) with the knowledge and tools to differentiate themselves from hyperscalers by offering a Database-as-a-Service (DBaaS) solution. The session will introduce and demonstrate CCX, a drop-in, premium DBaaS designed for rapid adoption.
Learn more about CCX for CSPs here: https://bit.ly/3VabiDr
Explore the rapid development journey of TryBoxLang, completed in just 48 hours. This session delves into the innovative process behind creating TryBoxLang, a platform designed to showcase the capabilities of BoxLang by Ortus Solutions. Discover the challenges, strategies, and outcomes of this accelerated development effort, highlighting how TryBoxLang provides a practical introduction to BoxLang's features and benefits.
3. Twitter: @christianposta
Blog: http://blog.christianposta.com
Email: christian@redhat.com
Christian Posta
Principal Architect – Red Hat
• Author “Microservices for Java Developers”
• Committer/contributor Apache Camel, Apache ActiveMQ,
Fabric8.io, Apache Kafka, Debezium.io, et. al.
• Worked with large Microservices, web-scale, unicorn
company
7. People try to copy Netflix, but they can only
copy what they see. They copy the
results, not the process.
Adrian Cockcroft, former Chief Cloud Architect, Netflix
9. • Maybe it doesn’t matter so much… What
we really care about is speed, reduced
time to value, and business outcomes.
• Maybe a data-driven approach is a better
way to answer this question...
Are you doing microservices?
10. • Number of features accepted
• % of features completed
• User satisfaction
• Feature Cycle time
• defects discovered after deployment
• customer lifetime value (future profit as a result of relationship with the
customer) https://en.wikipedia.org/wiki/Customer_lifetime_value
• revenue per feature
• mean time to recovery
• % improvement in SLA
• number of changes
• number of user complaints, recommendations, suggestions
• % favorable rating in surveys
• % of users using which features
• % reduction in error rates
• avg number of tx / user
• MANY MORE!
Are you doing microservices?
16. Book checkout / purchase Title Search
Recommendations
Weekly reporting
18. Focus on domain models, not data models
• Break things into smaller,
understandable models
• Surround a model and its
“context” with a boundary
• Implement the model in code
or get a new model
• Explicitly map between
different contexts
• Model transactional
boundaries as aggregates
19. Aggregates
• Use the domain to lead you to invariant rules across your domain
model
• Model the invariants and their associated entities/value objects as
“aggregates”
• Aggregates focus on transactional boundaries (ie, transactional in
the “A” from ACID sense)
• Individual aggregates are transactionally consistent
• Aggregates use relaxed consistency models between aggregates
(ie, something like the Actor model?)
• Bounded Contexts use relaxed consistency models between
boundaries
29. But ...
• Load/size is too great to fit on one box
• Modules/use cases have different read/write
characteristics
• Queries/joins are getting too complex
• Security issues
• Lots of conflicting changes to the model/schema
• Need denormalized, optimized indexing engines
• We can live with eventual consistency (whatever that
really means)
30. From here on out, what we’re saying is
“thank you old reliable, awesome database…
we’ve got it from here”…
35. We need to understand something about the data
inside our services and the data outside our services.
https://msdn.microsoft.com/en-us/library/ms954587.aspx
75. Replicated Data Consistency Explained through Baseball
(Doug Terry)
https://www.microsoft.com/en-us/research/publication/
replicated-data-consistency-explained-through-baseball/
• What consistency model do you need, depending on what
role you’re playing?
• What consistency model are you willing to pay for?
• Official score keeper? (Linearizability or RMW)
• Umpire? (Linearizability)
• Sports writer? (Bounded staleness, Eventual consistency)
• Radio updates? (Monotonic read, Bounded staleness)
• Statistician (Bounded staleness)
• Friends in the pub (Eventual consistency)
76. Replicated Data Consistency Explained through Baseball
(Doug Terry)
https://www.microsoft.com/en-us/research/publication/
replicated-data-consistency-explained-through-baseball/
77. Maybe we can use a relaxed consistency model for some
of those previously mentioned use cases…
79. Internet companies created their own tools
for helping with this. (some opensource!!)
• Yelp – MySQL Streamer
https://github.com/Yelp/mysql_streamer
• LinkedIn – Databus
https://github.com/linkedin/databus
• Zendesk – Maxwell
https://github.com/zendesk/maxwell
Speed!!!.... As in performance? Or scale? What is this speed thing all about?
This is a very different way of thinking about IT.
Typically IT is optimized for Cost. Many parts of the business are.
We’re not product companies anymore….
IT was traditionally used to transform otherwise paper processes or manual processes. And to support things like CRM, Accounting, Procurement, etc. Internally supporting.
But now companies are using IT to deliver value through services. In fact, startups, are finding out to deliver value through digital channels and are quickly disrupting old guard enterprise corporations.
We are service companies.
Services require bi-direction/omni-directional interactions, communication with our customers. Creating value is done with customers.
The faster you can get things to market the faster you can see what works and what doesn’t. We don’t know what will work up front. We don’t know what will deliver business value up front. We need to discover it.
What we want is to build an organization that’s able to experiment, fail fast, and iterate on what does work. We basically want IT to drive outcomes that deliver business value.
And we want to go fast.
The discovery of what’s important, and the experimentation process leads us to want to find business value. We want to quickly find out the things that don’t work and minimize the cost it takes to do these experiments. This transformation is a process, not something that happens over night, and not something you can copy. You’ll even note that each organization is different in how it can go about this process; each needs to balance speed, safety, and business value for itself.
Get back to first principles.
Focus on principles, patterns, methodologies.
Tools will help, but you cannot start with tools.
Autonomy….
What is it? Who defines it? Who owns that definition? Who owns the instances of it. How do I get it? How do I not miss something? And if I do solve these questions what does the architecture look like? Bunch of point to point connections? Lots of big up front design? Lots of contracts and governance? These things tend to break autonomy. Let’s explore this a bit and see what problems we run into with data in a microservices world.
Now, understanding the domain, understanding the data model, an understanding where the boundaries is complex stuff. It cannot be solved with technology alone.
Let me give you a simple example….
This seems like a simple, even absurd question. It’s really not. This one simple question can illustrate how ambiguous contradictory our language is with respect to understanding “real life”
We cannot understand how to store a representation of a perceived “real life” unless we can describe it in plain language without ambiguity.
What is a book and how do we represent it? We need to first understand what “one book is.”
If an author has written two books we may expect to see two “book” entries represented in some kind of editor database or bibliographic database as “two records”. If they’ve written two editions of the one book, does it appear multiple times? Or do we model that as a revision record? Or maybe each edition gets its own record?
If a library or bookstore has 5 copies each of two books, do they record that as books? Is a book really just a copy of a book? Or do we call it a copy? But maybe a library inventory system may just refer to copies as books as well for the purposes of counting the total number of physical books. So we could come up with “copies” and “books” but they can be used interchangeably depending on who’s asking.
Or maybe for some systems a Book is something with a hard cover because they want to exclude periodicals, magazines, ebooks? So a “manual” may be classified as a book in some contexts, but not others.
Or maybe a book is just a bounded physical unit? But some novels are so long they are actually broken down into two physical elements. Maybe labeled Volume I and Volume II. So then are those separate books or one Book? Or the opposite; maybe multiple novel compositions are bounded together into a single physical unit; but really they are individual works.
So we could have a system where the author has written one book, it’s broken into two phsycial volumes, also known as books, and each volume has 5 copies each for a total of ten books. So what is one book?
It gets incredibly confusing.
So now just try to wrap your head around a Customer, or Patient, or Account, etc. The same polysemes exist there, only far more convoluted and ambiguous. And now when we talk about microservices we talk about the big ball of mud and how we cannot change part of it without re-deploying others. That is the easy part. Reconciling all of the different implicit usages of domain models across multiple contexts slammed into a single application is the hard part.
A is the book checkout system -- book is a physical copy (second edition, volume I, II, etc all individual books)
B is the book search system – book can be individual works where a composition may be multiple books and volumes I and II are all the same book
C is the checkout reporting engine – a book is what A thinks is a book
D is a recommendation engine – a book isn’t even a book, it’s an “interest” which has a mapping between books
Book recommendations (D)
D also wants to consume messages from A.
But the things we need to do in our service is sufficiently different that we want to change the language.
A and B have a translation and are coordinating,
D is not coordinating and will build an AC that will do the translation. And it’s nobody else’s business how this works.
In this case, maybe we have a Book recommendation engine that also reads what gets checked out and whom. Maybe D has some more complicated models it uses for describing recommendations. It wants to use A’s data, but it doesn’t want to conform to A’s domain model. It builds an Anti-Corruption layer to keep its model pure and that can do the translation between its and A’s models.
An order.
A customer.
An account.
A return.
A claim.
A discount?
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries.
This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose.
Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics.
Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous.
Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries.
Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries.
Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
We store our data inside this thing…
We store our data inside this thing…
Do we really, as developers, understand how to properly use this thing?
Put it all into one big database…
No, seriously.. Just do this for your applications. You’ll save yourself a lot of trouble.
Focus on
Traditional Databases have tremendous flexibiluty, safety, etc.
Traditional Databases have tremendous flexibiluty, safety, etc.
Traditional Databases have tremendous flexibiluty, safety, etc.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries.
This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose.
Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics.
Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous.
Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries.
Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries.
Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries.
This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose.
Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics.
Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous.
Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries.
Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries.
Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
Business Agility!!!
Journey …
Understand them
Test them
Change them
Different pace – rate of change is key !!!
Agile business…
Before going to far, we should have a definition about what microservices are. When we talk about microservices, we talk about breaking up complicated, potentially really large systems, whatever they may be, into smaller components. We break them into smaller components so we can understand them individually, test them individually, scale them and ultimately change them at a different pace than the rest of the system. You can imagine having to please every master in a monoithic environment can slow/bogg things down and inhibit change. Which as we discussed in the beginning is the key here. We need to be able to work on systems that can change with the rest of the business as it’s getting even more competitive and disruptive.
One of the keys to this flexibility and ability to change is to focus on autonomy. Systems should be designed to be more autonomous so that changes don’t affect other downstream systems, faults don’t ripple across into cascading failures etc. The more dependencies we have (on other systems, protocols, shared libraries, databases, etc) the harder it can be to make changes. So we talk about services having and owning their own data, chosing the right technology for their function, and conciously enforcing modularity through APIs and contracts.
Autonomy is key here
But autonomoy of systems includes autonomoy of teams as well. Microservices can be a means to an end for a company serious about investing into their digital experiecne they provide to customers. It’s not in and of itself the end goal. It’s part of a digital transformation that encompasses all parts of the organization.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries.
This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose.
Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics.
Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous.
Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries.
Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries.
Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries.
This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose.
Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics.
Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous.
Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries.
Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries.
Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
One large database!
We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database!
We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database!
We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database!
We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database!
We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database!
We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database!
We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries.
This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose.
Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics.
Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous.
Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries.
Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries.
Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
In this scenario we may have established our boundaries… our customer profile service has taken an update to customer preferences. A customer and its profile/preferences may be modeled in other services like our recommendation engine, our master customer SOR, our social alerting engine, etc. And we need to update some important information... So the systems that are interested in this data must first be defined and implemented in code ahead of time. Adding a new system requires changes. Additionally, these downstream systems are not transactional... So if there are errors somewhere, then it’s up to the application try and decide what action to take... And while deciding that action, the application could fail.. And no state is stored about where it left off.. And now we’re in an inconsistent state.
You could try adding compensation logic and stateful tracking of this locally.. And it’s also great practice to implement idempotent consumers.. The problem with is there could be “read uncommitted” issues like dirty reads or dirty writes that happen downstream because of this, and a compensation now gets much more complicated.
We could just try emmitting events and say “whenever this happens over here, we just update a message queue”… now we have to try get consensus between the two systems. This can be expensive, as consensus tends to be, and you also suffer from availability issues..2PC is an anti-availability protocol.
2PC is perfectly fine when consensus is required, though have to consider the drawbacks. 2PC requires operational overhead to manage the TX Log of the tx manager. Also, you can run into issues with deadlocks when holding locks too long. You can also end up in heuristic situations where one side unilaterally rollsback. Now you need human intervention and reconcilliation logic. People poopoo 2PC, but it may be appropriate in some situations..
Another situation that will tend to come up is identifying boundaries around IO and read/write patterns. How do we get the writes over to the read database? Do we do 2PC from the application? Do we use a message queue?
What about the so called N+1 problem? Where we interact with downstream services, or maybe we take on events and need to enrich them with additional metadata. For example, we may want to group and sort a set of customers that fall into a certain criteria for specific recommendations, and we need to enrich the customer with additional preferences.. So we query for the customer list and then we loop through and enrich each customer.. Can downstream systems sustain this kind of rapid invocation? If they can, are you exposed at all to udnerlying storage incnsistenecies and concurrency issues? Do they just try create bulk APIs? And those APIs are inconsistent across providers (pagination, missed processing of singular elements, etc)
So maybe we set up a cache in front of the service to alleviate the penalties of calling downstream services rapidly… and now what sort of stale data can you deal with? Bounded staleness? How do you handle cache eviction?
We expect some levels of consistency and we need to be able to withstand faults because we know faults occur… But brewer says we can only pick 2 out of consistency, availability, partition tolerance...
Consistency models… the set of allowable histories of operations
We say that we read what we wrote
Now, a process is allowed to read the most recently written value from any process, not just itself.
The register becomes a place of coordination between two processes; they share state.
We relax our model and say when we read, we read the value at the time of the read and take into account other processes writes…
Now, a process is allowed to read the most recently written value from any process, not just itself.
The register becomes a place of coordination between two processes; they share state.
We relax our model and say when we read, we read the value at the time of the read and take into account other processes writes…
Somewhere (or some node… a database, a service, a set of databases in a cluster) where there is an appearance of an order that is immediately visible to everyone viewing it.
Moreover, linearizability’s time bounds guarantee that those changes will be visible to other participants after the operation completes. Hence, linearizability prohibits stale reads
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Score keeper – needs to read most up to date version of the score… cannot do an eventually consistent read (bounded staleness, consistent prefix, monotonic read).. BUT could do a read my writes read
Umpire needs to do a strict consistent read to determine at the 9th inning or any afterward whether he can end the game
Show a diagram of consistency related to performance..
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Show a diagram of consistency related to performance..
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.