Tef con2016 (1)

Best Practices for Inter-process
Communication
Gustavo Garcia
@anarchyco

What happens when your application and/or team starts growing?

Disclaimer: I don’t like the word. I’m
not advocating to use microservices.

Inter-process communication
Once you break a monolithic application into separate pieces – microservices –
the pieces need to speak to each other. And it turns out that you have many
options for inter-process communication.
1-1 1-many
SYNCHRONOUS Request / Response
ASYNCHRONOUS Notification
Request / Async Response
Publish / Subscribe
Publish / Async Responses

Request / Response (RPC)
Discover -> Format -> Send

Discovery and Load Balancing
When you are writing some code that invokes a service, in order to make a
request, your code needs to know the network location (IP address and port) of
a service instance.
In a modern, cloud-based microservices application, however, this is a much
more difficult problem to solve.
Service instances have dynamically assigned network locations and the set of
service instances changes dynamically because of autoscaling, failures, and
upgrades.

Discovery and Load Balancing
At a high level there are two different approaches:
Client-Side Discovery Pattern: The calling service needs to find the
Server-Side Discovery Pattern: The calling service sends the request to an
intermediary (router/proxy) who is the responsible of locating

Ribbon is a Inter Process Communication (remote procedure calls) library
with built in software load balancers. The primary usage model involves
REST calls with various serialization scheme support. It is heavily used in
production by Netflix.
Finagle clients come equipped with a load balancer, a pivotal
component in the client stack, whose responsibility is to dynamically
distribute load across a collection of interchangeable endpoints.
Finagle is the core component of the Twitter microservices architecture
and it is used by FourSquare, Tumblr, ING...
“A common anti-pattern used for HTTP microservices is to have a load
balancing service fronting each stateless microservice. “ Joyent.
“Generally, the Proxy Model is workable for simple to moderately complex
applications. It’s not the most efficient approach/Model for load
balancing, especially at scale.” Nginx.

Serialization / Formats
Different ways to serialize the information for sending:
- Interface Definition Language (protobuf, thrift, json schema ...)
- Schema-free or “Documentation” based
IDL based are usually binary (but not necessarily) and usually includes the
possibility of auto-generating code.

Serialization / Formats
Binary / Schema Text / Schema free
Efficiency High Lower
Development speed Low? High
Debugging / Readability Low High
Robustness High Low

Transport
Protocol HTTP, TCP
Security SSL, non-SSL
Reusing connections No reuse, Reusing, Multiplexing

Transport
Good News: HTTP/2
● Efficient, SSL, Multiplexed
● Supported by major libraries: gRPC, Finagle ...

Failures
Applications in complex distributed architectures have dozens of dependencies, each of
which will inevitably fail at some point. If the host application is not isolated from these
external failures, it risks being taken down with them.
For example, for an application that depends on 30 services where each service has
99.99% uptime, here is what you can expect: 99.9930 = 99.7% uptime
2+ hours downtime/month even if all dependencies have excellent uptime.
Reality is generally worse.

Engineering for Failure
Detect: How and when to mark a request as a failure
React: What do you do when you detect a failure
Isolate: Minimize the impact in the whole system

Detecting failures
What is the definition of failure?
Connection failures vs HTTP Response Status
Timeouts:
Sometimes is more difficult than what it looks like.
Fail Fast

Reacting to failures
Possible ways to react to failures:
Retrying the request again in case it is idempotent
Cache the results and return them if the next request fails or always
Fallback to return something else or change the logic when one of the
requests fails (for example sending a predefined value)

Circuit Breaker
If something is not working stop trying for a while
because it could to make it worse for you or for
them.
It can be a local Circuit Breaker or a global one

Example of logic
https://github.com/Netflix/Hystrix

Bulkhead pattern
A service miss-behaving shouldn’t affect rest of
services.
Control use of resources of the client to a specific
service.
Make sure a client to a specific service is not
blocking the whole process.

Swimline pattern
Mantien independent full stacks so that even in case of a problem in one of
them there is no full outage.

Back Pressure or Flow Control
When your server is under pressure you should use some counter-measures to
avoid making it worse.
For example wait accepting new connections, throttling messages, return 503...

Monitoring and Debugging
Knowing what’s happening in your service and why the latency or failures
increases is harder when you are calling 30 services to process the request.
Monitoring
Debugging

Monitoring
You need to know if any of your requests is taking longer than expected, how
many are failing, queue sizes...
33% HTTP EndPoint
33% Logs
33% No stats

Debugging
Consistency:
It has to be automatic
There has to be some guidelines and you have to be very strict
Traceability:
● Easily find all the requests belonging to the same call flow
● Identify the hierarchy (who is calling who)
sessionId == X OR sessionid == X OR session_id == X

Frameworks, Frameworks, Frameworks
DDIY
Boring is Good
Microservices Chassis
“Para comerme la mierda de otro mejor me como la mía”

Wrap Up
“When you move to a microservices architecture, it
comes with this constant tax on your development cycle
that’s going to slow you down from that point on”

Acknowledgements
All the projects collaborating in the survey

References
HOW TO ADOPT MICROSERVICES
https://www.nginx.com/resources/library/oreilly-building-microservices/
Microservices Architecture: The Good, The Bad, and What You Could Be Doing
Better
http://nordicapis.com/microservices-architecture-the-good-the-bad-and-what-
you-could-be-doing-better/

Tef con2016 (1)

Related slideshows

More Related Content

Tef con2016 (1)

Editor's Notes