SlideShare a Scribd company logo
AUTONOMOUS
MICROSERVICES
Building bullet-proof systems
Matthew Groves | Developer Advocate | @mgroves
(With lots of help from Denis Rosa and Craig Kovar!)
Autonomous Microservices - Manning - July 2020
NO ONE WANTS
MICROSERVICES
IT’S A NECESSARY EVIL
Jonas Bonér
4
Who am I?
4
• Matthew D. Groves
• Developer Advocate for Couchbase
• Twitter: @mgroves
• Live Coding: https://twitch.tv/matthewdgroves
• "I am not an expert, but I am an enthusiast." – Alan Stevens
by @natelovett
AUTONOMOUS
MICROSERVICES
Building bullet-proof systems
Matthew Groves | Developer Advocate | @mgroves
MICROLITHS
MICROSERVICES MAKE UP
SYSTEMS
BUT THEY NEED TO BE
AUTONOMOUS
Like Human Beings
SCALES EASILY (SLICING THE MONOLITH)
ISOLATED FAILURE
RESILIENT
ELASTIC
ISOLATED
FAILURE1
10
Dependencies
Service 1
Service 2
Order
Service
User
Service
Service 4
Delivery
Service
CACHE THE DATA YOU NEED
INSTEAD OF ASKING FOR IT
12
Local Cache
User Service
UserUpdatedEvent
Delivery Service
Order Service
Payment Service
13
Microservices Communication
ASYNC COMMUNICATION
BETWEEN SERVICES
SCALES EASILY
ISOLATED FAILURE (DATA CACHING AND ASYNC)
RESILIENT
ELASTIC
RESILIENT2
DON’T LET ANYONE BREAK
YOUR INTERNAL STATE
18
Disk is cheap, add versioning to your state and store
all received change requests, good for:
• Debugging
• Fixing inconsistences
• Auditing
• Query the state of an entity within a period
Event Sourcing/Logging
https://blog.couchbase.com/event-sourcing-event-logging-an-essential-microservice-pattern/
RESPECT OTHER SERVICES
20
Timeouts
Web App
Service 2
Service 3
Service 4
Thread Pool
Req
Req
Req
Req
…
21
Timeouts
Web App
Service 2
Service 3
Service 4
Thread Pool
Req
…
Req
Req
Req
22
Timeouts
23
Timeouts
24
Other things to consider
• Auto Retries (after a 502 for instance)
• Circuit Breakers
• Authentication
• Observed Latency
• Bulkheads
• Consistent Metrics
• Logging
• … and more
25
The common solution
26
The Service Mesh Pattern
Microservice
Pod
Service
Mesh
27
Service Mesh Providers
28
Linkerd Control Panel
SCALES EASILY (SLICING THE MONOLITH)
ISOLATED FAILURE (DATA CACHING AND ASYNC)
ELASTIC
RESILIENT (STATE VERSIONING, TIMEOUTS, CIRCUIT-BREAKERS AND BULKHEADS)
ELASTIC3
31
Service Discovery and LB
32
K8s Services
A Kubernetes Service is an
abstraction which defines a
logical set of Pods and a
policy by which to access
them
33
34
K8s
Replication
You can specify how many
replicas you need and auto
scale your services with Pod
Autoscaler
SCALES EASILY (SLICING THE MONOLITH)
ISOLATED FAILURE (DATA CACHING AND ASYNC)
ELASTIC
RESILIENT (STATE VERSIONING, TIMEOUTS, CIRCUIT-BREAKERS AND BULKHEADS)
MY MICROSERVICE IS HIGHLY
SCALABLE NOW, RIGHT?
WHAT ABOUT YOUR DATABASE?
DATABASES ARE
THE BOTTLENECK
OF MOST APPLICATIONS
38
Instance
Instance
Instance
Instance
Instance
Instance
Instance
Service
39
• Performs badly for reads & writes
• Scaling is expensive
• Won’t scale beyond some point
Relational Databases
40
Service 1 Service 2 Service 3 Service 4
Polyglot Persistence
41
Java .NET Node Python
Polyglot Persistence
42
Polyglot Persistence
DocumentsRelational Graph Search
Financial
records
User
Profiles
Fraud
detection
Search
engine
43
©2016 Couchbase Inc. 43
http://bit.ly/centralparkincident
44
Active Users Growth
NOT
EVERYBODY
IS A
46
Couchbase
• Common “replacement” for RDBMS
• Highly Scalable
• Memory First
• Fast for and reads/writes
47
48
Couchbase
K8s Cluster
Couchbase provides an
official kubernetes operator
49
Database
K8s Operators
Couchbase provides an
official Kubernetes operator
50
Couchbase K8s Operator
Automatically manage your cluster:
• Logs
• Auto Recovery
• Elastic Scalability
• Automated Cluster Provisioning
• Backups
• Etc
DEMO
52
Resources: Other Operators
https://operatorhub.io/
53
• 💻 Install Couchbase: https://couchbase.com/downloads
• 👩🏽🏫 Free training: https://learn.couchbase.com
• ��� Operator: https://www.couchbase.com/products/cloud/kubernetes
• 📝 Blogs: https://blog.couchbase.com/category/kubernetes/
• ❔ Forums: https://forums.couchbase.com/
• ⌨️ ASP.NET Core Getting Started: http://bit.ly/aspNetCoreMicroservices
Resources: Next Steps
54
@mgroves
twitch.tv/matthewdgroves
matthew.groves@couchbase.com
Resources: Me!
THANK YOU!

More Related Content

Autonomous Microservices - Manning - July 2020

Editor's Notes

  1. Microservices allow you to independently deploy and scale parts of your system. If you determine that just the User Profile part of your system is being used a lot, with a monolith you have to scale the whole thing. With a microservice, you can just scale the user part. And not only scale, but developer and deploy independently. BUT Microservices are a distributed system, which brings a lot of challenges with it Monoliths are much simpler to develop – a single application containing all the features So if I'm in the payment part of the system, and I need to access something about a user, then it's just a method call. Easy. * But in a microservice system, You're making an HTTP call to the user system, there's a chance that the user system is down Also, it's not easy to refactor. If I add a new field to the user service, I can't just go and update all the other services. Some other team might be responsible for the payment service. Transactions are a problem too. How can I guarantee that an update applies atomically to two different services?
  2. More expensive to develop (time and money). If you're a small team, don't need to scale a lot, stick to monoliths. If scale is a problem then microservices might help you. Not just talking about scaling servers—scaling your team, scaling your deployments, scaling your company.
  3. Migrating from monolith to microservices And what you might see is that you start to develop a microservice LIKE a monolith It's the worst of both worlds You have all the problems that you had before, but now they are distributed
  4. Microservices make up a system. It may be a dozen services, but they all act within the same system. Unlike a monolith, they need to be autonomous. If you are relying on synchronous calls, you will have trouble. Because the network is a whole new problem. If the user service is offline, you can't get user data. Think about microservices as if they were human beings. We rely on each other to commute to work, for instance. We rely on a bus driver to get us someplace, but most of the day, I don't need the bus driver to accomplish my job. I'm autonomous, but I rely on the bus driver at some level.
  5. So what does it mean to be an autonomous microservice? 4 characteristics. By being a microservice, that makes them easier to scale. That's the whole point. You slice up a monolith and can scale parts of it independently. But just slicing up a monolith doesn't necessarily give you these other things.
  6. So imagine a microservices architure All these services depend on other services But say User Service goes down, then the whole system goes down So now it's like a single point of failure So how do we isolate this failure? If the user service is offline, the other services need to keep going
  7. One way to improve isolation is caching
  8. So let's say that order, delivery, payment all depend on the user service, the user data But the user service goes offline, I'd like to keep going So setup some kind of asych communication Whenever a user is updated in user service, it will trigger some event for the other services to subscribe to These services will store the data that they NEED locally, not necessarily the entire user data, just the parts they need This usually happens with something like RabbitMQ or Kafka, that sort of thing The order service could still go to the user service and fall back to the cache But if you are caching data, that gives you a window of time to get the user service fixed, for instance There could be some eventual consistency problems here. The cache might be out of date for a short period of time (it's not synchronous) But thinking back to the metaphor of humans: Real life is not consistent. If I change my address, I fill out a card, but it might not get entered into the post office's system for hours or days. In the meantime, stuff will still get sent to my old address.
  9. Other types of communication We as devs generally tend to think about synchronous, although that's changing But most problems can be reframed to asych Streams is a whole different approach, I generally don't see that much, but it's an option
  10. Async should be the norm between services Think about placing an order on Amazon You place an order and you get a confirmation email right away, it *seems* synchronous But generally an inventory service is invoked, then a delivery service, then a payment service, and eventually you get another email that your order has been shipped and you've been charged. This means that even if the payment service is offline, you can always place an order. They get queued up, and once the payment service is back online, it goes to work.
  11. So if most communication between services is async and you have some caching, then you can tolerate some failure, maybe for minutes, hours, or a day But, our microservices are not yet resilient
  12. Checkbook analogy "writing a check", "balancing a checkbook"
  13. Event sourcing is a way of building the history of the state of some entity So you can then build the current state by using the history Kinda like an accounting ledger so let's say that some other team pushes an update, which causes inconsistencies in the service or data And that system sends a wrong message to my service So now my service stores the inconsistency, and now I'll send inconsistent messages And now we have a distributed bug. What's the source? How do we debug it? Event sourcing allows you to track the history and changes in your service. You can see when the inconsistency was generated. So you can add an offsetting record to fix it, you can reset the state and reprocess from some snapshot, etc Who has heard of event sourcing before? Who has used it? If you haven't, you should definitely start researching, and check out this blog post about event sourcing. Very important in a microservice system.
  14. In this case, service 2 is down, or having problems So maybe the thread in web app is locked, because it is waiting on service 2 The other threads aren't using service 2, so they are working fine.
  15. But what will happen over time, and this might be a short period, or it might be a long period depending on how often service 2 is being used Is that the entire thread pool in web app will get consumed And now web app is on fire, cascading failures So any request that DOESN'T need service 2, but only needs services 3 and 4, won't be able to go through Everything is being blocked by service 2
  16. Solution is to put in some timeouts This is a snippet of Java code that will trigger this kind of problem There is no timeout specified There is no DEFAULT READ timeout
  17. So define a timeout for how long to wait For both connect and for read
  18. There are some other things to consider Basically, you don't want your service to blow up because some other service has blown up Just a couple examples: Circuit breaker: basically after a certain number of failures, any more attempts to get that service will fail immediately until some timeout period. Bulkheads: Suppose we have a vital service that needs the user service, and suppose a less vital service, like analytics, that also needs user service. So a bulkhead allows you to prioritize and assign a limited number of threads to the analytics service. So even if analytics wants to get a million records, it will only be allowed, say, 2 threads at a time to do that.
  19. This a framework to help with many of these things created by Netflix Which is fine, but it pushes a lot of responsibility on the developer to handle these failure scenarios And it's also Java only And it's currently in maintenance mode
  20. There's another pattern to deal with this called the Service Mesh pattern Instead of the microservice itself handling network issues and stuff Deploy an application which takes care of it for me, and acts as a proxy to the other services This is a way to implement cross cutting concerns across all of your microservices So it has a circuit breaker built in to handle a lot of timeouts, for instance.
  21. Here are some of the more well-known service mesh providers Cross cutting concerns they provide: Externalized configuration - includes credentials, and network locations of external services such as databases and message brokers Logging - configuring of a logging framework such as log4j or logback Health checks - a url that a monitoring service can “ping” to determine the health of the application Metrics - measurements that provide insight into what the application is doing and how it is performing Distributed tracing - instrument services with code that assigns each external request an unique identifier that is passed between services.
  22. These tools also come with monitoring So, Hystrix is like a Java framework But let's say you are using .NET or Node or whatever, so you'd have to use a different tool Instead of Hystrix, you can run it through these service mesh providers no matter what language: it's language agnostic And you can get analytics on them
  23. So we've made our microservices resilient using event sourcing, timeouts, circuit breakers, etc
  24. Elastic, meaning we scale out a service (by adding more servers, more nodes) to accommodate more load The patterns to deal with this is Service Discovery and Load Balancing. When a new instance of a service is deployed, it has to tell some service registry that it's online. And when a service is needed, the service registry will be asked "hey, which instance can I call" But this may not be as necessary anymore, because now we have Kubernetes
  25. Kubernetes you get the same thing using services: a logical set of Pods and a policy So you define a service in kubernetes, and it will direct you to one of those instances
  26. Autoscaling is also available in kubernetes
  27. So we can specify a number of replicas I want to scale up, I just change the number Or I can use Pod Autoscaler to detect some situation and have Kubernetes launch another instance This is something that Azure / AWS can do, but with Kubernetes, this is cloud agnostic. So anywhere you can run Kubernetes, including Azure, AWS, your own data center, or wherever
  28. So at this point we've got an autonomous microservice We've checked all these boxes
  29. So now my microservice is highly scalable, right!? (pause) What about the database? I'm forgetting a big part of my system
  30. If we could just store in memory, doesn't matter the language, everything would be blazing fast But we have to store data, persist data, so it's important to take databases into consideration for the performance of the entire system
  31. So we have all these autonomous services, but maybe they're all talking to the same database So, naturally, we could just scale up the database, right?
  32. Scaling a relational database is not that easy Vertical scaling is "easy" but can be very expensive and will eventually hit a ceiling Scaling horizontally with relational is very challenging
  33. With microservices, it opens up polyglot opportunities We can use the right tool for the right job
  34. Service 1 can be in python, etc Maybe Python is good for certain cases that involve natural language processing Java microservice leverages the business logic that we've built over the years Some .NET services were brought over in acquisition And there's that one team that just loves JavaScript no matter what you tell them
  35. So we can do the same thing for persistence Maybe we store some financial data in relational, all my Java code is fine with Oracle We have another part of the system that stores user profiles in documents to better engage users and increase flexibility so I'll use Couchbase Yet another part uses a graph database to detect outliers and fraud so I'll use Neo4j here And yet another part uses a Full Text Search to help users navigate the site I can use Couchbase or Solr or whatever to accomplish that etc
  36. Anyone seen this? This was an incident that occurred at central park Pokemon go players were trying to capture a Vaporeon
  37. This is pokemon go, start from 0 users and going to almost 300,000,000 users in a few weeks Imagine the costs of scaling during this short period of time
  38. Not everyone is a pokemon go, scaling to 200 million users But it's not just about that. I'm sure Oracle could deliver this kind of scaling and performance But how much is it going to cost? NoSQL databases will cost much less, not just because of licensing, but also because of technology of HOW they scale So with polyglot databases, sometimes it's about using the right database You could use a relational database to work with graphs, but chances are a graph database will do it better. but sometimes it's about other tradeoffs, like cost.
  39. Of course I have to mention Couchbase It's a "replacement" not in the sense that I would say always replace your RDBMS with Couchbase But if you haven't used NoSQL before, a document database (like Couchbase) would be a good place But as you start to move towards microservices, and slice up your monolith, it may make sense to use Couchbase with some of those microservices I find Couchbase in particular to be easy to scale, but how do we make it "elastic" like our microservices?
  40. Anyone know what a Kubernetes Operator is? An Operator is an application-specific controller that extends the Kubernetes API to manage instances of complex stateful applications So you could have a Couchbase operator, a MySQL operator, etc. You don't NEED an operator, but if an operator is available, you should almost definitely use it
  41. So here is a Couchbase cluster defined in Kubernetes that uses the operator Notice that there is a "size" of "2" This defines how many nodes of Couchbase I want To add a 3rd node, I change this number to a 3 To do this WITHOUT an operator, you would need to do a lot more work: scripting, manual steps, etc
  42. Here's the YAML file for the Couchbase operator itself Operator is on version 1.2
  43. This is what the operator gives you. Again, you don't strictly *need* an operator. We have at least one customer I know of who has been using kubernetes and Couchbase since before we offered an operator You could use Couchbase's APIs to do this stuff yourself. But unless you have a really good reason to, I'd stick to the operator
  44. [time permitting] I'm going to show you a quick demo of Kubernetes and the Couchbase Kubernetes operator I'm running on Azure with AKS, but everything I'm showing you can be done on any Kubernetes cluster, whether it's amazon or google or whatever
  45. There are some operators for other databases out there. many of these aren't "official" yet, they are community driven But in the long run, I think Kubernetes operators will be the way to go with databases
  46. If you want to check out more about Kubernetes and/or Couchbase, here are some free resources for you
  47. If anything looks interesting to you, you have questions or feedback, come talk to me afterwards I want to hear from you! My boss says I have to listen to you, it's my job. So now's your chance :)