Autonomous Microservices - Manning - July 2020

•Download as PPTX, PDF•

0 likes•131 views

Everybody loves Microservices, but we all know how difficult it is to make it right. Distributed systems are much more complex to develop and maintain, and over time, we even miss the simplicity of old monoliths. In this talk, I propose a combination of infrastructure, architecture, and design principles to make your microservices bulletproof and easy to maintain with a combination of high scalability, elasticity, fault tolerance, and resilience. This session will also include a discussion about some microservices blueprints like asynchronous communications, how to avoid cascading failures in synchronous calls, and why you should use different storages according to the use case: Document Databases to speed up your performance, RDBMS for transactions, Graphs for recommendations, etc.

AUTONOMOUS
MICROSERVICES
Building bullet-proof systems
Matthew Groves | Developer Advocate | @mgroves
(With lots of help from Denis Rosa and Craig Kovar!)

Autonomous Microservices - Manning - July 2020

NO ONE WANTS
MICROSERVICES
IT’S A NECESSARY EVIL
Jonas Bonér

4
Who am I?
4
• Matthew D. Groves
• Developer Advocate for Couchbase
• Twitter: @mgroves
• Live Coding: https://twitch.tv/matthewdgroves
• "I am not an expert, but I am an enthusiast." – Alan Stevens
by @natelovett

AUTONOMOUS
MICROSERVICES
Building bullet-proof systems
Matthew Groves | Developer Advocate | @mgroves

MICROSERVICES MAKE UP
SYSTEMS
BUT THEY NEED TO BE
AUTONOMOUS
Like Human Beings

SCALES EASILY (SLICING THE MONOLITH)
ISOLATED FAILURE
RESILIENT
ELASTIC

10
Dependencies
Service 1
Service 2
Order
Service
User
Service
Service 4
Delivery
Service

CACHE THE DATA YOU NEED
INSTEAD OF ASKING FOR IT

12
Local Cache
User Service
UserUpdatedEvent
Delivery Service
Order Service
Payment Service

SCALES EASILY
ISOLATED FAILURE (DATA CACHING AND ASYNC)
RESILIENT
ELASTIC

DON’T LET ANYONE BREAK
YOUR INTERNAL STATE

18
Disk is cheap, add versioning to your state and store
all received change requests, good for:
• Debugging
• Fixing inconsistences
• Auditing
• Query the state of an entity within a period
Event Sourcing/Logging
https://blog.couchbase.com/event-sourcing-event-logging-an-essential-microservice-pattern/

20
Timeouts
Web App
Service 2
Service 3
Service 4
Thread Pool
Req
Req
Req
Req
…

21
Timeouts
Web App
Service 2
Service 3
Service 4
Thread Pool
Req
…
Req
Req
Req

24
Other things to consider
• Auto Retries (after a 502 for instance)
• Circuit Breakers
• Authentication
• Observed Latency
• Bulkheads
• Consistent Metrics
• Logging
• … and more

26
The Service Mesh Pattern
Microservice
Pod
Service
Mesh

SCALES EASILY (SLICING THE MONOLITH)
ISOLATED FAILURE (DATA CACHING AND ASYNC)
ELASTIC
RESILIENT (STATE VERSIONING, TIMEOUTS, CIRCUIT-BREAKERS AND BULKHEADS)

32
K8s Services
A Kubernetes Service is an
abstraction which defines a
logical set of Pods and a
policy by which to access
them

34
K8s
Replication
You can specify how many
replicas you need and auto
scale your services with Pod
Autoscaler

MY MICROSERVICE IS HIGHLY
SCALABLE NOW, RIGHT?
WHAT ABOUT YOUR DATABASE?

DATABASES ARE
THE BOTTLENECK
OF MOST APPLICATIONS

38
Instance
Instance
Instance
Instance
Instance
Instance
Instance
Service

39
• Performs badly for reads & writes
• Scaling is expensive
• Won’t scale beyond some point
Relational Databases

40
Service 1 Service 2 Service 3 Service 4
Polyglot Persistence

41
Java .NET Node Python
Polyglot Persistence

42
Polyglot Persistence
DocumentsRelational Graph Search
Financial
records
User
Profiles
Fraud
detection
Search
engine

43
©2016 Couchbase Inc. 43
http://bit.ly/centralparkincident

46
Couchbase
• Common “replacement” for RDBMS
• Highly Scalable
• Memory First
• Fast for and reads/writes

48
Couchbase
K8s Cluster
Couchbase provides an
official kubernetes operator

49
Database
K8s Operators
Couchbase provides an
official Kubernetes operator

50
Couchbase K8s Operator
Automatically manage your cluster:
• Logs
• Auto Recovery
• Elastic Scalability
• Automated Cluster Provisioning
• Backups
• Etc

52
Resources: Other Operators
https://operatorhub.io/

53
• 💻 Install Couchbase: https://couchbase.com/downloads
• 👩🏽🏫 Free training: https://learn.couchbase.com
• �� Operator: https://www.couchbase.com/products/cloud/kubernetes
• 📝 Blogs: https://blog.couchbase.com/category/kubernetes/
• ❔ Forums: https://forums.couchbase.com/
• ⌨️ ASP.NET Core Getting Started: http://bit.ly/aspNetCoreMicroservices
Resources: Next Steps

54
@mgroves
twitch.tv/matthewdgroves
matthew.groves@couchbase.com
Resources: Me!

More Related Content

Autonomous Microservices - Manning - July 2020

1. AUTONOMOUS MICROSERVICES Building bullet-proof systems Matthew Groves | Developer Advocate | @mgroves (With lots of help from Denis Rosa and Craig Kovar!)

3. NO ONE WANTS MICROSERVICES IT’S A NECESSARY EVIL Jonas Bonér

4. 4 Who am I? 4 • Matthew D. Groves • Developer Advocate for Couchbase • Twitter: @mgroves • Live Coding: https://twitch.tv/matthewdgroves • "I am not an expert, but I am an enthusiast." – Alan Stevens by @natelovett

5. AUTONOMOUS MICROSERVICES Building bullet-proof systems Matthew Groves | Developer Advocate | @mgroves

6. MICROLITHS

7. MICROSERVICES MAKE UP SYSTEMS BUT THEY NEED TO BE AUTONOMOUS Like Human Beings

8. SCALES EASILY (SLICING THE MONOLITH) ISOLATED FAILURE RESILIENT ELASTIC

9. ISOLATED FAILURE1

10. 10 Dependencies Service 1 Service 2 Order Service User Service Service 4 Delivery Service

11. CACHE THE DATA YOU NEED INSTEAD OF ASKING FOR IT

12. 12 Local Cache User Service UserUpdatedEvent Delivery Service Order Service Payment Service

13. 13 Microservices Communication

14. ASYNC COMMUNICATION BETWEEN SERVICES

15. SCALES EASILY ISOLATED FAILURE (DATA CACHING AND ASYNC) RESILIENT ELASTIC

16. RESILIENT2

17. DON’T LET ANYONE BREAK YOUR INTERNAL STATE

18. 18 Disk is cheap, add versioning to your state and store all received change requests, good for: • Debugging • Fixing inconsistences • Auditing • Query the state of an entity within a period Event Sourcing/Logging https://blog.couchbase.com/event-sourcing-event-logging-an-essential-microservice-pattern/

19. RESPECT OTHER SERVICES

20. 20 Timeouts Web App Service 2 Service 3 Service 4 Thread Pool Req Req Req Req …

21. 21 Timeouts Web App Service 2 Service 3 Service 4 Thread Pool Req … Req Req Req

22. 22 Timeouts

23. 23 Timeouts

24. 24 Other things to consider • Auto Retries (after a 502 for instance) • Circuit Breakers • Authentication • Observed Latency • Bulkheads • Consistent Metrics • Logging • … and more

25. 25 The common solution

26. 26 The Service Mesh Pattern Microservice Pod Service Mesh

27. 27 Service Mesh Providers

28. 28 Linkerd Control Panel

29. SCALES EASILY (SLICING THE MONOLITH) ISOLATED FAILURE (DATA CACHING AND ASYNC) ELASTIC RESILIENT (STATE VERSIONING, TIMEOUTS, CIRCUIT-BREAKERS AND BULKHEADS)

30. ELASTIC3

31. 31 Service Discovery and LB

32. 32 K8s Services A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them

33. 33

34. 34 K8s Replication You can specify how many replicas you need and auto scale your services with Pod Autoscaler

35. SCALES EASILY (SLICING THE MONOLITH) ISOLATED FAILURE (DATA CACHING AND ASYNC) ELASTIC RESILIENT (STATE VERSIONING, TIMEOUTS, CIRCUIT-BREAKERS AND BULKHEADS)

36. MY MICROSERVICE IS HIGHLY SCALABLE NOW, RIGHT? WHAT ABOUT YOUR DATABASE?

37. DATABASES ARE THE BOTTLENECK OF MOST APPLICATIONS

38. 38 Instance Instance Instance Instance Instance Instance Instance Service

39. 39 • Performs badly for reads & writes • Scaling is expensive • Won’t scale beyond some point Relational Databases

40. 40 Service 1 Service 2 Service 3 Service 4 Polyglot Persistence

41. 41 Java .NET Node Python Polyglot Persistence

42. 42 Polyglot Persistence DocumentsRelational Graph Search Financial records User Profiles Fraud detection Search engine

44. 44 Active Users Growth

45. NOT EVERYBODY IS A

46. 46 Couchbase • Common “replacement” for RDBMS • Highly Scalable • Memory First • Fast for and reads/writes

47. 47

48. 48 Couchbase K8s Cluster Couchbase provides an official kubernetes operator

49. 49 Database K8s Operators Couchbase provides an official Kubernetes operator

50. 50 Couchbase K8s Operator Automatically manage your cluster: • Logs • Auto Recovery • Elastic Scalability • Automated Cluster Provisioning • Backups • Etc

51. DEMO

52. 52 Resources: Other Operators https://operatorhub.io/

53. 53 • 💻 Install Couchbase: https://couchbase.com/downloads • 👩🏽🏫 Free training: https://learn.couchbase.com • 📅 Operator: https://www.couchbase.com/products/cloud/kubernetes • 📝 Blogs: https://blog.couchbase.com/category/kubernetes/ • ❔ Forums: https://forums.couchbase.com/ • ⌨️ ASP.NET Core Getting Started: http://bit.ly/aspNetCoreMicroservices Resources: Next Steps

54. 54 @mgroves twitch.tv/matthewdgroves matthew.groves@couchbase.com Resources: Me!

55. THANK YOU!

Editor's Notes

Microservices allow you to independently deploy and scale parts of your system. If you determine that just the User Profile part of your system is being used a lot, with a monolith you have to scale the whole thing. With a microservice, you can just scale the user part. And not only scale, but developer and deploy independently. BUT Microservices are a distributed system, which brings a lot of challenges with it Monoliths are much simpler to develop – a single application containing all the features So if I'm in the payment part of the system, and I need to access something about a user, then it's just a method call. Easy. * But in a microservice system, You're making an HTTP call to the user system, there's a chance that the user system is down Also, it's not easy to refactor. If I add a new field to the user service, I can't just go and update all the other services. Some other team might be responsible for the payment service. Transactions are a problem too. How can I guarantee that an update applies atomically to two different services?
More expensive to develop (time and money). If you're a small team, don't need to scale a lot, stick to monoliths. If scale is a problem then microservices might help you. Not just talking about scaling servers—scaling your team, scaling your deployments, scaling your company.
Migrating from monolith to microservices And what you might see is that you start to develop a microservice LIKE a monolith It's the worst of both worlds You have all the problems that you had before, but now they are distributed
Microservices make up a system. It may be a dozen services, but they all act within the same system. Unlike a monolith, they need to be autonomous. If you are relying on synchronous calls, you will have trouble. Because the network is a whole new problem. If the user service is offline, you can't get user data. Think about microservices as if they were human beings. We rely on each other to commute to work, for instance. We rely on a bus driver to get us someplace, but most of the day, I don't need the bus driver to accomplish my job. I'm autonomous, but I rely on the bus driver at some level.
So what does it mean to be an autonomous microservice? 4 characteristics. By being a microservice, that makes them easier to scale. That's the whole point. You slice up a monolith and can scale parts of it independently. But just slicing up a monolith doesn't necessarily give you these other things.
So imagine a microservices architure All these services depend on other services But say User Service goes down, then the whole system goes down So now it's like a single point of failure So how do we isolate this failure? If the user service is offline, the other services need to keep going
One way to improve isolation is caching
So let's say that order, delivery, payment all depend on the user service, the user data But the user service goes offline, I'd like to keep going So setup some kind of asych communication Whenever a user is updated in user service, it will trigger some event for the other services to subscribe to These services will store the data that they NEED locally, not necessarily the entire user data, just the parts they need This usually happens with something like RabbitMQ or Kafka, that sort of thing The order service could still go to the user service and fall back to the cache But if you are caching data, that gives you a window of time to get the user service fixed, for instance There could be some eventual consistency problems here. The cache might be out of date for a short period of time (it's not synchronous) But thinking back to the metaphor of humans: Real life is not consistent. If I change my address, I fill out a card, but it might not get entered into the post office's system for hours or days. In the meantime, stuff will still get sent to my old address.
Other types of communication We as devs generally tend to think about synchronous, although that's changing But most problems can be reframed to asych Streams is a whole different approach, I generally don't see that much, but it's an option
Async should be the norm between services Think about placing an order on Amazon You place an order and you get a confirmation email right away, it *seems* synchronous But generally an inventory service is invoked, then a delivery service, then a payment service, and eventually you get another email that your order has been shipped and you've been charged. This means that even if the payment service is offline, you can always place an order. They get queued up, and once the payment service is back online, it goes to work.
So if most communication between services is async and you have some caching, then you can tolerate some failure, maybe for minutes, hours, or a day But, our microservices are not yet resilient
Checkbook analogy "writing a check", "balancing a checkbook"
Event sourcing is a way of building the history of the state of some entity So you can then build the current state by using the history Kinda like an accounting ledger so let's say that some other team pushes an update, which causes inconsistencies in the service or data And that system sends a wrong message to my service So now my service stores the inconsistency, and now I'll send inconsistent messages And now we have a distributed bug. What's the source? How do we debug it? Event sourcing allows you to track the history and changes in your service. You can see when the inconsistency was generated. So you can add an offsetting record to fix it, you can reset the state and reprocess from some snapshot, etc Who has heard of event sourcing before? Who has used it? If you haven't, you should definitely start researching, and check out this blog post about event sourcing. Very important in a microservice system.
In this case, service 2 is down, or having problems So maybe the thread in web app is locked, because it is waiting on service 2 The other threads aren't using service 2, so they are working fine.
But what will happen over time, and this might be a short period, or it might be a long period depending on how often service 2 is being used Is that the entire thread pool in web app will get consumed And now web app is on fire, cascading failures So any request that DOESN'T need service 2, but only needs services 3 and 4, won't be able to go through Everything is being blocked by service 2
Solution is to put in some timeouts This is a snippet of Java code that will trigger this kind of problem There is no timeout specified There is no DEFAULT READ timeout
So define a timeout for how long to wait For both connect and for read
There are some other things to consider Basically, you don't want your service to blow up because some other service has blown up Just a couple examples: Circuit breaker: basically after a certain number of failures, any more attempts to get that service will fail immediately until some timeout period. Bulkheads: Suppose we have a vital service that needs the user service, and suppose a less vital service, like analytics, that also needs user service. So a bulkhead allows you to prioritize and assign a limited number of threads to the analytics service. So even if analytics wants to get a million records, it will only be allowed, say, 2 threads at a time to do that.
This a framework to help with many of these things created by Netflix Which is fine, but it pushes a lot of responsibility on the developer to handle these failure scenarios And it's also Java only And it's currently in maintenance mode
There's another pattern to deal with this called the Service Mesh pattern Instead of the microservice itself handling network issues and stuff Deploy an application which takes care of it for me, and acts as a proxy to the other services This is a way to implement cross cutting concerns across all of your microservices So it has a circuit breaker built in to handle a lot of timeouts, for instance.
Here are some of the more well-known service mesh providers Cross cutting concerns they provide: Externalized configuration - includes credentials, and network locations of external services such as databases and message brokers Logging - configuring of a logging framework such as log4j or logback Health checks - a url that a monitoring service can “ping” to determine the health of the application Metrics - measurements that provide insight into what the application is doing and how it is performing Distributed tracing - instrument services with code that assigns each external request an unique identifier that is passed between services.
These tools also come with monitoring So, Hystrix is like a Java framework But let's say you are using .NET or Node or whatever, so you'd have to use a different tool Instead of Hystrix, you can run it through these service mesh providers no matter what language: it's language agnostic And you can get analytics on them
So we've made our microservices resilient using event sourcing, timeouts, circuit breakers, etc
Elastic, meaning we scale out a service (by adding more servers, more nodes) to accommodate more load The patterns to deal with this is Service Discovery and Load Balancing. When a new instance of a service is deployed, it has to tell some service registry that it's online. And when a service is needed, the service registry will be asked "hey, which instance can I call" But this may not be as necessary anymore, because now we have Kubernetes
Kubernetes you get the same thing using services: a logical set of Pods and a policy So you define a service in kubernetes, and it will direct you to one of those instances
Autoscaling is also available in kubernetes
So we can specify a number of replicas I want to scale up, I just change the number Or I can use Pod Autoscaler to detect some situation and have Kubernetes launch another instance This is something that Azure / AWS can do, but with Kubernetes, this is cloud agnostic. So anywhere you can run Kubernetes, including Azure, AWS, your own data center, or wherever
So at this point we've got an autonomous microservice We've checked all these boxes
So now my microservice is highly scalable, right!? (pause) What about the database? I'm forgetting a big part of my system
If we could just store in memory, doesn't matter the language, everything would be blazing fast But we have to store data, persist data, so it's important to take databases into consideration for the performance of the entire system
So we have all these autonomous services, but maybe they're all talking to the same database So, naturally, we could just scale up the database, right?
Scaling a relational database is not that easy Vertical scaling is "easy" but can be very expensive and will eventually hit a ceiling Scaling horizontally with relational is very challenging
With microservices, it opens up polyglot opportunities We can use the right tool for the right job
Service 1 can be in python, etc Maybe Python is good for certain cases that involve natural language processing Java microservice leverages the business logic that we've built over the years Some .NET services were brought over in acquisition And there's that one team that just loves JavaScript no matter what you tell them
So we can do the same thing for persistence Maybe we store some financial data in relational, all my Java code is fine with Oracle We have another part of the system that stores user profiles in documents to better engage users and increase flexibility so I'll use Couchbase Yet another part uses a graph database to detect outliers and fraud so I'll use Neo4j here And yet another part uses a Full Text Search to help users navigate the site I can use Couchbase or Solr or whatever to accomplish that etc
Anyone seen this? This was an incident that occurred at central park Pokemon go players were trying to capture a Vaporeon
This is pokemon go, start from 0 users and going to almost 300,000,000 users in a few weeks Imagine the costs of scaling during this short period of time
Not everyone is a pokemon go, scaling to 200 million users But it's not just about that. I'm sure Oracle could deliver this kind of scaling and performance But how much is it going to cost? NoSQL databases will cost much less, not just because of licensing, but also because of technology of HOW they scale So with polyglot databases, sometimes it's about using the right database You could use a relational database to work with graphs, but chances are a graph database will do it better. but sometimes it's about other tradeoffs, like cost.
Of course I have to mention Couchbase It's a "replacement" not in the sense that I would say always replace your RDBMS with Couchbase But if you haven't used NoSQL before, a document database (like Couchbase) would be a good place But as you start to move towards microservices, and slice up your monolith, it may make sense to use Couchbase with some of those microservices I find Couchbase in particular to be easy to scale, but how do we make it "elastic" like our microservices?
Anyone know what a Kubernetes Operator is? An Operator is an application-specific controller that extends the Kubernetes API to manage instances of complex stateful applications So you could have a Couchbase operator, a MySQL operator, etc. You don't NEED an operator, but if an operator is available, you should almost definitely use it
So here is a Couchbase cluster defined in Kubernetes that uses the operator Notice that there is a "size" of "2" This defines how many nodes of Couchbase I want To add a 3rd node, I change this number to a 3 To do this WITHOUT an operator, you would need to do a lot more work: scripting, manual steps, etc
Here's the YAML file for the Couchbase operator itself Operator is on version 1.2
This is what the operator gives you. Again, you don't strictly *need* an operator. We have at least one customer I know of who has been using kubernetes and Couchbase since before we offered an operator You could use Couchbase's APIs to do this stuff yourself. But unless you have a really good reason to, I'd stick to the operator
[time permitting] I'm going to show you a quick demo of Kubernetes and the Couchbase Kubernetes operator I'm running on Azure with AKS, but everything I'm showing you can be done on any Kubernetes cluster, whether it's amazon or google or whatever
There are some operators for other databases out there. many of these aren't "official" yet, they are community driven But in the long run, I think Kubernetes operators will be the way to go with databases
If you want to check out more about Kubernetes and/or Couchbase, here are some free resources for you
If anything looks interesting to you, you have questions or feedback, come talk to me afterwards I want to hear from you! My boss says I have to listen to you, it's my job. So now's your chance :)