SlideShare a Scribd company logo
1
Florent Rivoire
DevOps engineer
Team « Instances » (previously "Compute")
Aime le vélo, les poissons rouges et « Plus Belle La
Vie S5E17» (en VOST seulement !)
A toujours rêvé de rencontrer M. Propre en vrai !
2
3
Start my
instance
« website » Instance
running
Inside the cloud
4
Inside Instances’ control plane
5
Lots of components communicating
6
Including 1000s of hypervisors & VMs
HV
HV
VMs
VMs
HV VMs
…...
7
A lot of messages to route
HV
HV
VMs
VMs
HV VMs
…...
8
Client-side
Web

console
Web

console
APIs
Client-side Instances control plane
Web

console
APIs
Client-side Instances control plane
DBs
& auth
Web

console
APIs
Worker
dispatcher
Client-side Instances control plane
DBs
& auth
Web

console
APIs
Worker
dispatcher
Client-side Instances control plane
DBs
& auth
Start instance XYZ
Web

console
APIs
Worker
dispatcher
Client-side Instances control plane
DBs
& auth
Start instance XYZ
Queuing 101
14
• Asynchronous processing

→ « instant » API response
• Decoupling of components (smaller)

→ maintainability, testing
• Allow scaling each layer independently

→ performance, monitoring
Our queuing stack
15
Celery RabbitMQ
Python lib Daemon (erlang)
opensource opensource
RabbitMQ, overview
16
• message broker
• accepts, stores and forwards messages
• « like a post office »
RabbitMQ, overview
17
node
RabbitMQ
cluster
Producer ConsumerMessage Messages
Messages sorted
in « queues »
Celery
18
• async task queue model
• similar to RPC : run a function (task) on a remote worker
• based on distributed message passing (like RabbitMQ)
• can create complex tasks graph

(sequential and/or parallel tasks, error callback, retry, etc.)
Web

console
APIs
Worker
dispatcher
Client-side Instances control plane
DBs
& auth
Start instance XYZ
Worker "dispatcher"
20
• Handling allocation of instances, orchestration of tasks
• 1 logical worker
• but several "physical" copies (for HA and scaling)
• processing one « big » pool of tasks for the whole AZ
→ simple queuing: 1 queue « dispatcher »
[import+config of celery]
celery.start()
@celery.task()
def poweron_server(server_id, model):
physical_node = allocate(model)
    prepare(server_id, physical_node)
———————————————————————————————————————————————
celery.send_task("poweron_server",
     kwargs={'server_id': 12345,
'model': 'GP1-S'}
)
21
Example: simple task
Worker side
API side
Web

console
APIs
Worker
dispatcher
Client-side Instances control plane
DBs
& auth
23
Web

console
APIs
Worker
dispatcher
Client-side Instances control plane Hypervisor A
Hypervisor B
DBs
& auth
24
Web

console
APIs
Worker
dispatcher
Client-side Instances control plane Hypervisor A
VMs
Volumes
VMs
Volumes
Hypervisor B
DBs
& auth
25
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
26
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
Hypervisor’s workers
27
• Two workers per hypervisor :
1. to manage volumes (local-storage)
2. to manage VMs (start, stop, etc.)
• processing only tasks related to this HV precisely
→ not so common routing pattern
28
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
29
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
30
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
download

volume

5643
poweron
VM 431
31
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
32
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
RabbitMQ
33
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
RabbitMQ
34
Cluster RabbitMQ
API & worker
dispatcher
RabbitMQ routing
35
Cluster RabbitMQ
API & worker
dispatcher
RabbitMQ routing
Worker
dispatcher
Worker
« vm » HV-A
Worker
« vm » HV-B
Worker
« storage » X
36
Queue
Queue
Queue
Cluster RabbitMQ
API & worker
dispatcher
RabbitMQ routing
Worker
dispatcher
Worker
« vm » HV-A
Worker
« vm » HV-B
Queue
Worker
« storage » X
37
Queue
Queue
Queue
Cluster RabbitMQ
API & worker
dispatcher
RabbitMQ routing to queues
Worker
dispatcher
Worker
« vm » HV-A
Worker

« vm » HV-B
? Queue
Worker
« storage » X
38
Which
worker ?
Which
cluster ?
dispatcher storagevm
Which
cluster ?
dispatcher
vm
53-11
Which
HV ?
Which
HV ?
Which
HV ?
Which
HV ?
Which
HV ?
53 54 55 53 54
11 12
vm
53-12
Queue
Value
Criteria
vm
54-98
98 99
vm
53-99
…. …. ….
Logical routing
tree
Routing challenge
39
• Need routing to proper destination
• Simple to use
• Reasonable speed (msg/seconds)
• But a lot of queues/connections :
- 1000s
- growing with nb of HVs
Our solution
40
Use RabbitMQ features at maximum :
• « bindings »: routing-rules
• « exchange »: router, not storing msg
To route Celery messages
41
Publish
Publish
Binding Binding
Exchange
B
Binding
Exchange
A
Binding
Binding
Exchange
C
Queue Y
Queue Z
Queue X
Cluster
RabbitMQ
M M M
M
M M
RabbitMQ, inside the cluster
42
Description Example
routing-key Scalar value (ASCII) celery-task [not used]
headers Key/value pairs
- daemon => storage

- cluster => 54

- hypervisor => 98
content_type Format of the body application/json
body
Free format

(never read by RabbitMQ)
{

id:  "1326d7f2-fc36-4271-…",

task: "download_volume",

kwargs: {volume_id: 56789,

verify_chksum: True}

}
RabbitMQ, message structure
RabbitMQ, binding
43
• 1 src exchange
• some condition:
- routing-key match
- one key/value match
• 1 target (exchange / queue)
Multiple bindings possible (« many to many »)
RabbitMQ, binding
44
• 1 src exchange
• some condition:
- routing-key match
- one key/value match
• 1 target (exchange / queue)
Multiple bindings possible (« many to many »)
Example :
• msg in exchange « global »
• if match:

- header « daemon »

- is equal to « dispatcher »
• route to queue « dispatcher »
45
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
RabbitMQ
Code example
46
Using our "scw-routed-rpc" library
• simple wrapper around Celery
• simplify configuration of routing :
- exchanges
- bindings
- queues
[import+config of scw_routed_rpc]
scw_routed_rpc.send_task(
"download_volume",
    kwargs={'volume_id': 56789, 'verify_chksum': True},
    headers={
     'daemon': 'storage',
        'cluster_id': 54,
        'hypervisor_id': 98,
    },
)
47
Example: send a task
[import+config of scw_routed_rpc]
scw_routed_rpc.send_task(
"download_volume",
    kwargs={'volume_id': 56789, 'verify_chksum': True},
    headers={
     'daemon': 'storage',
        'cluster_id': 54,
        'hypervisor_id': 98,
    },
)
48
Example: send a task
[import+config of scw_routed_rpc]
scw_routed_rpc.start({'daemon': 'storage',
                   'cluster_id': 54,
                   'hypervisor_id': 98})
@scw_routed_rpc.task()
def download_volume(volume_id, verify_chksum):
transfer_file(volume_id)
    finalize_volume(volume_id, verify_chksum)
@scw_routed_rpc.task()
def upload_volume(volume_id, foo):
    [...]
49
Example: worker storage
[import+config of scw_routed_rpc]
scw_routed_rpc.start({'daemon': 'storage',
                   'cluster_id': 54,
                   'hypervisor_id': 98})
@scw_routed_rpc.task()
def download_volume(volume_id, verify_chksum):
transfer_file(volume_id)
    finalize_volume(volume_id, verify_chksum)
@scw_routed_rpc.task()
def upload_volume(volume_id, foo):
    [...]
50
Example: worker storage
51
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
RabbitMQ
52
Queue
Queue
Queue
Cluster RabbitMQ
API & worker
dispatcher
RabbitMQ routing to queues
Worker
dispatcher
Worker
« vm » HV-A
Worker

« vm » HV-B
? Queue
Worker
« storage » X
53
Queue
Queue
Queue
Cluster RabbitMQ
API & worker
dispatcher
RabbitMQ routing to queues
Worker
dispatcher
Worker
« vm » HV-A
Worker

« vm » HV-B
Queue
Worker
« storage » X
54
Queue
Queue
Queue
Cluster RabbitMQ
API & worker
dispatcher
RabbitMQ routing to queues
Queue
55
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
….
Queue
….
56
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
….
Queue
….
Exch-
ange
global
57
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
global storage
vm
….
daemon="dispatcher"
daemon="vm"
Exch-
ange
Queue
binding
….
daemon=
"storage"
58
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
global storage
vm
….
daemon="dispatcher"
daemon="vm"
Exch-
ange
Queue
binding
….
daemon=
"storage"
59
Example
routing-key celery-task [not used]
headers
- daemon => storage

- cluster => 54

- hypervisor => 98
content_type application/json
body
{

id:  "1326d7f2-fc36-4271-…",

task: "download_volume",

kwargs: {volume_id: 56789,

verify_chksum: True}

}
RabbitMQ, message structure
60
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
global storage
vm
….
daemon="dispatcher"
daemon="vm"
Exch-
ange
Queue
binding
….
daemon=
"storage"
61
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
global storage
storage-
53
storage-
54
vm
….
daemon="dispatcher"
daemon="vm"
Exch-
ange
Queue
binding
….
daemon=
"storage"
cluster="53"
cluster="54"
62
Example
routing-key celery-task [not used]
headers
- daemon => storage

- cluster => 54

- hypervisor => 98
content_type application/json
body
{

id:  "1326d7f2-fc36-4271-…",

task: "download_volume",

kwargs: {volume_id: 56789,

verify_chksum: True}

}
RabbitMQ, message structure
63
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
global storage
storage-
53
storage-
54
vm
….
daemon="dispatcher"
daemon="vm"
Exch-
ange
Queue
binding
….
daemon=
"storage"
cluster="53"
cluster="54"
64
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
global storage
storage-
53
storage-
54
vm
….
daemon="dispatcher"
daemon="vm"
Exch-
ange
Queue
binding
….
daemon=
"storage"
cluster="53"
cluster="54"
hv="99"
hv="98"
65
Example
routing-key celery-task [not used]
headers
- daemon => storage

- cluster => 54

- hypervisor => 98
content_type application/json
body
{

id:  "1326d7f2-fc36-4271-…",

task: "download_volume",

kwargs: {volume_id: 56789,

verify_chksum: True}

}
RabbitMQ, message structure
66
storage-
54-98
storage-
54-99
dispatcher
Cluster RabbitMQ
RabbitMQ routing to queues
global storage
storage-
53
storage-
54
vm
….
daemon="dispatcher"
daemon="vm"
Exch-
ange
Queue
binding
….
daemon=
"storage"
cluster="53"
cluster="54"
hv="99"
hv="98"
67
Web

console
APIs
Worker
dispatcher
Worker(s)
Client-side Instances control plane Hypervisor A
VMs
Volumes
Worker(s)
VMs
Volumes
Hypervisor B
DBs
& auth
RabbitMQ
Experience (1)
68
• Tuning Celery parameters

→ nb of process, nb of tasks/process, prefetch, etc.
• Complex config (lib over-engineered at first)

→ feature removed: broadcast, namespace, versioning
• End2end tracing (1 API-call, N tasks)

→ generate a « req-id » on API

→ correlate all tasks with the req-id
Experience (2)
69
• Too many connections on RabbitMQ

→ enough RAM + proper limits (erlang process, file-
descriptors)
• High availability control-plane

→ clustering RabbitMQ, multiple copies of workers
• RabbitMQ management web interface

→ powerful, use it !!
Conclusion
70
+
MERCI
Suivez notre actualité, tutoriels inédits et infos cloud sur
Twitter et LinkedIn @Scaleway
71
Et retrouvez toutes les présentations du Scaleday sur Slideshare :
72

More Related Content

Routage à grande échelle des requêtes via RabbitMQ