SlideShare a Scribd company logo
Coordination in
Distributed Systems
Andrea Monacchi
Consensus, Configuration, Reliability
Agenda
1. Consensus
2. Apache Zookeeper
3. ETCD
Consensus
● Multiple processes must agree on a value (e.g. time, state, price)
○ https://www.confluent.io/blog/distributed-consensus-reloaded-apache-zookeeper-and-replication-in-kafka/
● Synchronization in systems - Very general topic - many applications
○ clock synchronization - e.g. firefly synchronization
○ smart power grids - e.g. phased-locked loop (PLL)
○ load balancing
○ state-machine replication and distributed replica
○ distributed-lock and distributed-transaction algorithms
● In practice
○ environment is noisy - i.e. faults may arise endogenously/exougenously
■ Crash failures (process stops) - process or network caused
■ Byzantine failures (most general failure) - probabilistic terms, identification is strategy/protocol driven
○ consensus protocols must be fault tolerant (accept failure)
Consensus
Protocol requirements:
● Agreement - consensus should result in one common value
● Termination - consensus process eventually converges
● Integrity - if correct processes picked v, then v should be selected by consensus process

Recommended for you

Distributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionDistributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock Detection

This document summarizes key concepts related to distributed mutual exclusion and distributed deadlock detection. It discusses classification of distributed mutual exclusion algorithms into token-based and non-token-based approaches. For distributed mutual exclusion, it describes Lamport's algorithm, Ricart-Agrawala algorithm, Maekawa's quorum-based algorithm, and Suzuki-Kasami's token-based broadcast algorithm. It also discusses requirements for mutual exclusion such as freedom from deadlock and starvation. For distributed deadlock detection, it mentions the system model and types of deadlocks as well as approaches for prevention, avoidance, detection, and resolution of deadlocks.

distributed deadlock detectiondistributed mutual exclusion
Sistemas Distribuidos
Sistemas DistribuidosSistemas Distribuidos
Sistemas Distribuidos

Diego Souza fala sobre sistemas distribuídos mostradando uma introdução sobre os conceitos básicos e algumas considerações práticas que podem afetar o nosso dia a dia. Assista esta palestra em https://www.eventials.com/locaweb/sistemas-distribuidos/

desenvolvedordeveloperdev
Reader/writer problem
Reader/writer problemReader/writer problem
Reader/writer problem

The reader/writer problem involves coordinating access to shared data by multiple reader and writer processes. There are two main approaches: (1) prioritizing readers, where readers can access the data simultaneously but writers must wait, risking writer starvation. This can be solved using semaphores. (2) Prioritizing writers, where new readers must wait if a writer is already accessing the data. This prevents starvation and can be implemented using monitors. The document then describes how to use semaphores to solve the reader/writer problem by prioritizing readers, with mutex, wrt, and readcount semaphores controlling access for readers and writers.

processsynchronizationreader writer problem
Consensus
Single-valued
● agreeing on single integer value
● agreeing on binary value (binary consensus)
Reference algorithm: Paxos (by Leslie Lamport)
● difficult to understand and implement
● even worse in multi-paxos variant
● implementations far from theory (multiple
flavours derived)
Multi-valued
● agreement of a sequence of values
● can be decomposed in multiple single-valued
Reference algorithm: Multi-Paxos, Raft
● Raft has understandability at its core
● consensus decomposed to 3 sub-problems
Raft
Consensus process decomposed to 3 sub-problems:
● leader election - upon failures on the current
leader (only 1 leader at time)
● log replication - the leader keeps the logs of all
other servers in sync with his ones, via
replication (identical append-only logs)
● safety - if any server has committed a log entry
at a certain index, no other one can apply a
different log for that index (append only log)
Peculiarities:
● strong leadership - log replicated only from leader
to followers; logs are received from clients on
leader, then propagated by leader to followers
who apply it when considered safe.
● leader election - using randomized timers and
heartbeats from leader to followers;
● Membership changes - using new joint consensus
approach to keep cluster operating during
configuration changes;
https://raft.github.io/raft.pdf
Raft: leader election
Each node is a state machine in 1 of these states: ● Each follower uses a randomized timer and a heartbeat on the current
leader;
● Upon timer expiration, node becomes candidate and asks all nodes for
votes; Voting as first-come-first-served on requests;
● If majority arrives, then node confirms leaderships by sending
heartbeat to all to establish authority;
● No concept of time, rather arbitrary terms of arbitrary length with 1
defined leader
● Term identified by ID monotonically increasing and used to mark each
communication;
● If Candidate receives AppendEntries (by other candidate) with same or
higher Term ID, then becomes follower and updates local Term ID.
● If more candidates at same time exist vote may lead to no majority
(split votes), then new election begins;
● Random election timeouts used by each node to prevent split votes;
first expiring asks for new elections;
2 remote-procedure calls (RPCs):
● RequestVotes - from candidate to all
● AppendEntries - from leader to followers
to replicate log entries and/or send heartbeat
source: original paper
Raft: log replication
● Leader receives command for its FSM from client
● Leader appends command to its log
● Leader calls AppendEntries in parallel to all nodes to
replicate the entry;
● If log entry was safely replicated by majority of
followers then (it is safe to commit it) run command
and return result;
● if not (e.g. follower crashed or too slow) leader retries
AppendEntries indefinetely until all followers are in
sync with their log;
● Term ID stored with command as Log entry to detect
inconsistencies;
● Followers can locally apply committed command;
Notes:
● Followers’ crashes can be easily recovered
● Leaders’ crashes may leave log inconsistent
○ not all log entries may have been replicated
Solution - Log Matching Property:
● AppendEntries include consistency check, i.e. reference
to TermID and command of previous Log Entry;
● Log Entry is refused if previous Log Entry does not
match that of leader;
● Conflicting entries overwritten with that of leader
(appended after last common entry for both);
● Leaders never overwrite their own log;

Recommended for you

Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications

Gossip là một giao thức trao đổi thông tin phổ biến trong các hệ thống phân tán giúp cho các máy chủ duy trì trạng thái đồng nhất với nhau cũng như thực hiện các nhiệm vụ có chủ đích. Điểm mạnh của nó là khả năng phát tán thông tin ở tốc độ cao cũng như không hề có single point of failure. Trong bài talk này, Anh Nguyễn Anh Tú, thành viên của Grokking sẽ chia sẻ một số thông tin về giao thức Gossip cũng như điểm qua một vài ứng dụng thực tiễn của nó. - Về diễn giả: Anh Nguyễn Anh Tú hiện đang là Staff Software Engineer tại Axon Vietnam, đồng thời là thành viên của Grokking Vietnam.

grokkingvietnamdistributed systemstechtalk
Unit 5 dbms
Unit 5 dbmsUnit 5 dbms
Unit 5 dbms

Concurrency Control Techniques: Concurrency Control, Locking Techniques for Concurrency Control, Time Stamping Protocols for Concurrency Control, Validation Based Protocol, Multiple Granularity, Multi Version Schemes, Recovery with Concurrent Transaction,

concurrency
Task migration in os
Task migration in osTask migration in os
Task migration in os

This document discusses task migration in distributed systems. It defines task migration as the preemptive transfer of a partially executed task to another node. The document outlines the group members working on the task migration project and provides an introduction to terminology used. It then discusses reasons for task migration like load distribution and improving communication. The main issues discussed are state transfer costs, location transparency, migration mechanism structure, and performance impacts.

Raft: Safety
Problem:
Followers may be unavailable while leader commits certain
entries and then become leader itself, thus overwriting the
previous leader entries;
Solution:
Restrict access to leadership by ensuring that leader for a term
contains all entries committed in previous terms (completeness);
Implementation:
● RequestVote RPC includes info on candidate log, so that the
voter can decline if its local log is more up to date than the
candidate one;
● Logs of candidate and voter compared by i) last entries (in
terms of Term Id and command) and ii) length;
● To avoid leader crashing before committing entries and
having partially replicated the entry; only allow leader to
commit entries that have the current Term ID; By the
properties of log replication, only if consistency check is
passed (last terms match) additional entries can be added;
thus resulting in correct log merging;
Apache Zookeeper
● Coordination using a shared file system
○ CRUD operations on tiny files called ZNodes, having stat-like versioning and ACLS
○ ZNodes can have associated data and children nodes - Read this
○ Event-driven communication - Clients can watch for changes on ZNodes - Example
○ Ephemeral Nodes - can’t have children and expire when the creating process dies
○ TTL Nodes - automatic removal upon timer expiration
● Used for Service Discovery, Metadata Management, Synchronization, Leader Election
○ e.g. Hadoop/Yarn and Kafka, many more https://zookeeper.apache.org/doc/r3.6.2/zookeeperUseCases.html
○ Using a barrier to synchronize distributed processes (fork/join model)
■ use node b1/ as barrier, each process adds a child node in b1, of kind b1/p1, b1/p2, and so on
■ when enough processes have created their nodes (as provided to the barrier), the join can start
■ example can be generalized to other synch. models, e.g. locks, 2-phase Commit, Leader election
○ Producer-consumer model simply adding child nodes (using sequential ID) in the ZNode queue
Apache Zookeeper: Example
Apache Kafka Cluster
git clone 
https://github.com/simplesteph/kafka-stack-docker-compose.gi
t
docker-compose -f zk-single-kafka-single.yml up -d
docker-compose -f zk-multiple-kafka-multiple.yml up -d
ZooNavigator
docker run 
-d --network host 
-e HTTP_PORT=9000 
--name zoonavigator 
--restart unless-stopped 
elkozmon/zoonavigator:latest
Apache Zookeeper: Example
$ kaf topic create example_topic_folks
✅ Created topic!
Topic Name: example_topic_folks
Partitions: 1
Replication Factor: 1
Cleanup Policy: delete

Recommended for you

Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

replicationdistributed systemscomputing
Trafodion Distributed Transaction Management
Trafodion Distributed Transaction ManagementTrafodion Distributed Transaction Management
Trafodion Distributed Transaction Management

Trafodion brings a completely distributed scalable transaction management implementation integrated into HBase. It does not suffer from the scale and performance limitations of other transaction managers on HBase. This presentation reviews the elegant architecture and how this architecture is leveraged to provide full ACID SQL transactional capabilities across multiple rows, tables, statements, and region servers. It discusses the life of a transaction from BEGIN WORK, to updates, to ABORT WORK, to COMMIT WORK, and then discusses recovery and high availability capabilities provided. An accompanying white paper goes into depth explaining this animated presentation in more detail. Given the increasing interest for transaction managers on Hadoop, or to provide transactional capabilities for NoSQL users when needed, the Trafodion community can certainly open up this Distributed Transaction Management support to be leveraged by implementations other than Trafodion.

distributed transaction manager sql trafodion scal
Operating system 27 semaphores
Operating system 27 semaphoresOperating system 27 semaphores
Operating system 27 semaphores

In 1965, Dijkstra proposed a new and very significant technique for managing concurrent processes by using the value of a simple integer variable to synchronize the progress of interacting processes. This integer variable is called semaphore. So it is basically a synchronizing tool and is accessed only through two low standard atomic operations, wait and signal designated by P() and V() respectively. The classical definition of wait and signal are : Wait : decrement the value of its argument S as soon as it would become non-negative. Signal : increment the value of its argument, S as an individual operation.

vaibhav khannaoperating system
ETCD
● Very similar to Zookeeper
○ file system-like structure with similar functionalities for Nodes (e.g. event-reaction on changes, TTL)
● Raft protocol to distribute replicas across ETCD cluster
● Either addressed via REST/JSON (e.g. by UIs) or gRPC (by other services) - See API v2, v3
● Very lightweight and performant (written in Go) - core component of Kubernetes

More Related Content

What's hot

CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
J Singh
 
SCP
SCPSCP
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
Nilu Desai
 
Distributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionDistributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock Detection
SHIKHA GAUTAM
 
Sistemas Distribuidos
Sistemas DistribuidosSistemas Distribuidos
Sistemas Distribuidos
Locaweb
 
Reader/writer problem
Reader/writer problemReader/writer problem
Reader/writer problem
RinkuMonani
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications
Grokking VN
 
Unit 5 dbms
Unit 5 dbmsUnit 5 dbms
Unit 5 dbms
Sweta Singh
 
Task migration in os
Task migration in osTask migration in os
Task migration in os
uos lahore pakistan
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
Kavya Barnadhya Hazarika
 
Trafodion Distributed Transaction Management
Trafodion Distributed Transaction ManagementTrafodion Distributed Transaction Management
Trafodion Distributed Transaction Management
Rohit Jain
 
Operating system 27 semaphores
Operating system 27 semaphoresOperating system 27 semaphores
Operating system 27 semaphores
Vaibhav Khanna
 

What's hot (12)

CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
 
SCP
SCPSCP
SCP
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
 
Distributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock DetectionDistributed Mutual Exclusion and Distributed Deadlock Detection
Distributed Mutual Exclusion and Distributed Deadlock Detection
 
Sistemas Distribuidos
Sistemas DistribuidosSistemas Distribuidos
Sistemas Distribuidos
 
Reader/writer problem
Reader/writer problemReader/writer problem
Reader/writer problem
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications
 
Unit 5 dbms
Unit 5 dbmsUnit 5 dbms
Unit 5 dbms
 
Task migration in os
Task migration in osTask migration in os
Task migration in os
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
Trafodion Distributed Transaction Management
Trafodion Distributed Transaction ManagementTrafodion Distributed Transaction Management
Trafodion Distributed Transaction Management
 
Operating system 27 semaphores
Operating system 27 semaphoresOperating system 27 semaphores
Operating system 27 semaphores
 

Similar to Coordination in distributed systems

Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
Abdulaziz AlMalki
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
Athira Mukundan
 
Consensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_systemConsensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_system
Atin Mukherjee
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
ScyllaDB
 
The journey to container adoption in enterprise
The journey to container adoption in enterpriseThe journey to container adoption in enterprise
The journey to container adoption in enterprise
Igor Moochnick
 
Manging scalability of distributed system
Manging scalability of distributed systemManging scalability of distributed system
Manging scalability of distributed system
Atin Mukherjee
 
UNIT-2-PROCESS MANAGEMENT in opeartive system.pptx
UNIT-2-PROCESS MANAGEMENT in opeartive system.pptxUNIT-2-PROCESS MANAGEMENT in opeartive system.pptx
UNIT-2-PROCESS MANAGEMENT in opeartive system.pptx
nagarajans87
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
Ankita Kapratwar
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
javier ramirez
 
Unit 2 part 2(Process)
Unit 2 part 2(Process)Unit 2 part 2(Process)
Unit 2 part 2(Process)
WajeehaBaig
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
confluent
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
Ramith Jayasinghe
 
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
CODE BLUE
 
Operating System Notes (1).pdf
Operating System Notes (1).pdfOperating System Notes (1).pdf
Operating System Notes (1).pdf
shriyashpatil7
 
Operating System Notes.pdf
Operating System Notes.pdfOperating System Notes.pdf
Operating System Notes.pdf
AminaArshad42
 
Operating System Notes help for interview pripration
Operating System Notes  help for interview priprationOperating System Notes  help for interview pripration
Operating System Notes help for interview pripration
ajaybiradar99999
 
Introduction to concurrent programming with Akka actors
Introduction to concurrent programming with Akka actorsIntroduction to concurrent programming with Akka actors
Introduction to concurrent programming with Akka actors
Shashank L
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm Users
Weaveworks
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
Anh Le
 

Similar to Coordination in distributed systems (20)

Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
Consensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_systemConsensus algo with_distributed_key_value_store_in_distributed_system
Consensus algo with_distributed_key_value_store_in_distributed_system
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
 
The journey to container adoption in enterprise
The journey to container adoption in enterpriseThe journey to container adoption in enterprise
The journey to container adoption in enterprise
 
Manging scalability of distributed system
Manging scalability of distributed systemManging scalability of distributed system
Manging scalability of distributed system
 
UNIT-2-PROCESS MANAGEMENT in opeartive system.pptx
UNIT-2-PROCESS MANAGEMENT in opeartive system.pptxUNIT-2-PROCESS MANAGEMENT in opeartive system.pptx
UNIT-2-PROCESS MANAGEMENT in opeartive system.pptx
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
 
Unit 2 part 2(Process)
Unit 2 part 2(Process)Unit 2 part 2(Process)
Unit 2 part 2(Process)
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
 
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
[CB16] COFI break – Breaking exploits with Processor trace and Practical cont...
 
Operating System Notes (1).pdf
Operating System Notes (1).pdfOperating System Notes (1).pdf
Operating System Notes (1).pdf
 
Operating System Notes.pdf
Operating System Notes.pdfOperating System Notes.pdf
Operating System Notes.pdf
 
Operating System Notes help for interview pripration
Operating System Notes  help for interview priprationOperating System Notes  help for interview pripration
Operating System Notes help for interview pripration
 
Introduction to concurrent programming with Akka actors
Introduction to concurrent programming with Akka actorsIntroduction to concurrent programming with Akka actors
Introduction to concurrent programming with Akka actors
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm Users
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
 

More from Andrea Monacchi

Mastro
MastroMastro
Mastro
MastroMastro
Introduction to istio
Introduction to istioIntroduction to istio
Introduction to istio
Andrea Monacchi
 
Anomaly detection on wind turbine data
Anomaly detection on wind turbine dataAnomaly detection on wind turbine data
Anomaly detection on wind turbine data
Andrea Monacchi
 
Towards Data Operations
Towards Data OperationsTowards Data Operations
Towards Data Operations
Andrea Monacchi
 
Welcome to Load Disaggregation and Building Energy Management
Welcome to Load Disaggregation and Building Energy ManagementWelcome to Load Disaggregation and Building Energy Management
Welcome to Load Disaggregation and Building Energy Management
Andrea Monacchi
 
An Early Warning System for Ambient Assisted Living
An Early Warning System for Ambient Assisted LivingAn Early Warning System for Ambient Assisted Living
An Early Warning System for Ambient Assisted Living
Andrea Monacchi
 
Assisting Energy Management in Smart Buildings and Microgrids
Assisting Energy Management in Smart Buildings and MicrogridsAssisting Energy Management in Smart Buildings and Microgrids
Assisting Energy Management in Smart Buildings and Microgrids
Andrea Monacchi
 
Analytics as value added service for energy utilities
Analytics as value added service for energy utilitiesAnalytics as value added service for energy utilities
Analytics as value added service for energy utilities
Andrea Monacchi
 
HEMS: A Home Energy Market Simulator
HEMS: A Home Energy Market SimulatorHEMS: A Home Energy Market Simulator
HEMS: A Home Energy Market Simulator
Andrea Monacchi
 
GREEND: An energy consumption dataset of households in Austria and Italy
GREEND: An energy consumption dataset of households in Austria and ItalyGREEND: An energy consumption dataset of households in Austria and Italy
GREEND: An energy consumption dataset of households in Austria and Italy
Andrea Monacchi
 

More from Andrea Monacchi (11)

Mastro
MastroMastro
Mastro
 
Mastro
MastroMastro
Mastro
 
Introduction to istio
Introduction to istioIntroduction to istio
Introduction to istio
 
Anomaly detection on wind turbine data
Anomaly detection on wind turbine dataAnomaly detection on wind turbine data
Anomaly detection on wind turbine data
 
Towards Data Operations
Towards Data OperationsTowards Data Operations
Towards Data Operations
 
Welcome to Load Disaggregation and Building Energy Management
Welcome to Load Disaggregation and Building Energy ManagementWelcome to Load Disaggregation and Building Energy Management
Welcome to Load Disaggregation and Building Energy Management
 
An Early Warning System for Ambient Assisted Living
An Early Warning System for Ambient Assisted LivingAn Early Warning System for Ambient Assisted Living
An Early Warning System for Ambient Assisted Living
 
Assisting Energy Management in Smart Buildings and Microgrids
Assisting Energy Management in Smart Buildings and MicrogridsAssisting Energy Management in Smart Buildings and Microgrids
Assisting Energy Management in Smart Buildings and Microgrids
 
Analytics as value added service for energy utilities
Analytics as value added service for energy utilitiesAnalytics as value added service for energy utilities
Analytics as value added service for energy utilities
 
HEMS: A Home Energy Market Simulator
HEMS: A Home Energy Market SimulatorHEMS: A Home Energy Market Simulator
HEMS: A Home Energy Market Simulator
 
GREEND: An energy consumption dataset of households in Austria and Italy
GREEND: An energy consumption dataset of households in Austria and ItalyGREEND: An energy consumption dataset of households in Austria and Italy
GREEND: An energy consumption dataset of households in Austria and Italy
 

Recently uploaded

1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
Mani Krishna Sarkar
 
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdfGUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
ProexportColombia1
 
Development of Chatbot Using AI/ML Technologies
Development of  Chatbot Using AI/ML TechnologiesDevelopment of  Chatbot Using AI/ML Technologies
Development of Chatbot Using AI/ML Technologies
maisnampibarel
 
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-IDUNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
GOWSIKRAJA PALANISAMY
 
MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme
MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K SchemeMSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme
MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme
Anwar Patel
 
Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...
Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...
Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...
IJAEMSJORNAL
 
kiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinkerkiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinker
hamedmustafa094
 
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE DonatoCONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
Servizi a rete
 
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.docCCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
Dss
 
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
YanKing2
 
Exploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative ReviewExploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative Review
sipij
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
IJAEMSJORNAL
 
Chlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptxChlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptx
yadavsuyash008
 
Conservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic RegenerationConservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic Regeneration
PriyankaKarn3
 
IS Code SP 23: Handbook on concrete mixes
IS Code SP 23: Handbook  on concrete mixesIS Code SP 23: Handbook  on concrete mixes
IS Code SP 23: Handbook on concrete mixes
Mani Krishna Sarkar
 
Response & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITHResponse & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITH
IIIT Hyderabad
 
Trends in Computer Aided Design and MFG.
Trends in Computer Aided Design and MFG.Trends in Computer Aided Design and MFG.
Trends in Computer Aided Design and MFG.
Tool and Die Tech
 
LeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdfLeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdf
pavanaroshni1977
 
Quadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and ControlQuadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and Control
Blesson Easo Varghese
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Bert Blevins
 

Recently uploaded (20)

1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
 
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdfGUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
GUIA_LEGAL_CHAPTER-9_COLOMBIAN ELECTRICITY (1).pdf
 
Development of Chatbot Using AI/ML Technologies
Development of  Chatbot Using AI/ML TechnologiesDevelopment of  Chatbot Using AI/ML Technologies
Development of Chatbot Using AI/ML Technologies
 
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-IDUNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
UNIT I INCEPTION OF INFORMATION DESIGN 20CDE09-ID
 
MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme
MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K SchemeMSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme
MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme MSBTE K Scheme
 
Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...
Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...
Profiling of Cafe Business in Talavera, Nueva Ecija: A Basis for Development ...
 
kiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinkerkiln burning and kiln burner system for clinker
kiln burning and kiln burner system for clinker
 
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE DonatoCONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
CONVEGNO DA IRETI 18 giugno 2024 | PASQUALE Donato
 
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.docCCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
CCS367-STORAGE TECHNOLOGIES QUESTION BANK.doc
 
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...
 
Exploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative ReviewExploring Deep Learning Models for Image Recognition: A Comparative Review
Exploring Deep Learning Models for Image Recognition: A Comparative Review
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
 
Chlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptxChlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptx
 
Conservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic RegenerationConservation of Taksar through Economic Regeneration
Conservation of Taksar through Economic Regeneration
 
IS Code SP 23: Handbook on concrete mixes
IS Code SP 23: Handbook  on concrete mixesIS Code SP 23: Handbook  on concrete mixes
IS Code SP 23: Handbook on concrete mixes
 
Response & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITHResponse & Safe AI at Summer School of AI at IIITH
Response & Safe AI at Summer School of AI at IIITH
 
Trends in Computer Aided Design and MFG.
Trends in Computer Aided Design and MFG.Trends in Computer Aided Design and MFG.
Trends in Computer Aided Design and MFG.
 
LeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdfLeetCode Database problems solved using PySpark.pdf
LeetCode Database problems solved using PySpark.pdf
 
Quadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and ControlQuadcopter Dynamics, Stability and Control
Quadcopter Dynamics, Stability and Control
 
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionUnderstanding Cybersecurity Breaches: Causes, Consequences, and Prevention
Understanding Cybersecurity Breaches: Causes, Consequences, and Prevention
 

Coordination in distributed systems

  • 1. Coordination in Distributed Systems Andrea Monacchi Consensus, Configuration, Reliability
  • 2. Agenda 1. Consensus 2. Apache Zookeeper 3. ETCD
  • 3. Consensus ● Multiple processes must agree on a value (e.g. time, state, price) ○ https://www.confluent.io/blog/distributed-consensus-reloaded-apache-zookeeper-and-replication-in-kafka/ ● Synchronization in systems - Very general topic - many applications ○ clock synchronization - e.g. firefly synchronization ○ smart power grids - e.g. phased-locked loop (PLL) ○ load balancing ○ state-machine replication and distributed replica ○ distributed-lock and distributed-transaction algorithms ● In practice ○ environment is noisy - i.e. faults may arise endogenously/exougenously ■ Crash failures (process stops) - process or network caused ■ Byzantine failures (most general failure) - probabilistic terms, identification is strategy/protocol driven ○ consensus protocols must be fault tolerant (accept failure)
  • 4. Consensus Protocol requirements: ● Agreement - consensus should result in one common value ● Termination - consensus process eventually converges ● Integrity - if correct processes picked v, then v should be selected by consensus process
  • 5. Consensus Single-valued ● agreeing on single integer value ● agreeing on binary value (binary consensus) Reference algorithm: Paxos (by Leslie Lamport) ● difficult to understand and implement ● even worse in multi-paxos variant ● implementations far from theory (multiple flavours derived) Multi-valued ● agreement of a sequence of values ● can be decomposed in multiple single-valued Reference algorithm: Multi-Paxos, Raft ● Raft has understandability at its core ● consensus decomposed to 3 sub-problems
  • 6. Raft Consensus process decomposed to 3 sub-problems: ● leader election - upon failures on the current leader (only 1 leader at time) ● log replication - the leader keeps the logs of all other servers in sync with his ones, via replication (identical append-only logs) ● safety - if any server has committed a log entry at a certain index, no other one can apply a different log for that index (append only log) Peculiarities: ● strong leadership - log replicated only from leader to followers; logs are received from clients on leader, then propagated by leader to followers who apply it when considered safe. ● leader election - using randomized timers and heartbeats from leader to followers; ● Membership changes - using new joint consensus approach to keep cluster operating during configuration changes; https://raft.github.io/raft.pdf
  • 7. Raft: leader election Each node is a state machine in 1 of these states: ● Each follower uses a randomized timer and a heartbeat on the current leader; ● Upon timer expiration, node becomes candidate and asks all nodes for votes; Voting as first-come-first-served on requests; ● If majority arrives, then node confirms leaderships by sending heartbeat to all to establish authority; ● No concept of time, rather arbitrary terms of arbitrary length with 1 defined leader ● Term identified by ID monotonically increasing and used to mark each communication; ● If Candidate receives AppendEntries (by other candidate) with same or higher Term ID, then becomes follower and updates local Term ID. ● If more candidates at same time exist vote may lead to no majority (split votes), then new election begins; ● Random election timeouts used by each node to prevent split votes; first expiring asks for new elections; 2 remote-procedure calls (RPCs): ● RequestVotes - from candidate to all ● AppendEntries - from leader to followers to replicate log entries and/or send heartbeat source: original paper
  • 8. Raft: log replication ● Leader receives command for its FSM from client ● Leader appends command to its log ● Leader calls AppendEntries in parallel to all nodes to replicate the entry; ● If log entry was safely replicated by majority of followers then (it is safe to commit it) run command and return result; ● if not (e.g. follower crashed or too slow) leader retries AppendEntries indefinetely until all followers are in sync with their log; ● Term ID stored with command as Log entry to detect inconsistencies; ● Followers can locally apply committed command; Notes: ● Followers’ crashes can be easily recovered ● Leaders’ crashes may leave log inconsistent ○ not all log entries may have been replicated Solution - Log Matching Property: ● AppendEntries include consistency check, i.e. reference to TermID and command of previous Log Entry; ● Log Entry is refused if previous Log Entry does not match that of leader; ● Conflicting entries overwritten with that of leader (appended after last common entry for both); ● Leaders never overwrite their own log;
  • 9. Raft: Safety Problem: Followers may be unavailable while leader commits certain entries and then become leader itself, thus overwriting the previous leader entries; Solution: Restrict access to leadership by ensuring that leader for a term contains all entries committed in previous terms (completeness); Implementation: ● RequestVote RPC includes info on candidate log, so that the voter can decline if its local log is more up to date than the candidate one; ● Logs of candidate and voter compared by i) last entries (in terms of Term Id and command) and ii) length; ● To avoid leader crashing before committing entries and having partially replicated the entry; only allow leader to commit entries that have the current Term ID; By the properties of log replication, only if consistency check is passed (last terms match) additional entries can be added; thus resulting in correct log merging;
  • 10. Apache Zookeeper ● Coordination using a shared file system ○ CRUD operations on tiny files called ZNodes, having stat-like versioning and ACLS ○ ZNodes can have associated data and children nodes - Read this ○ Event-driven communication - Clients can watch for changes on ZNodes - Example ○ Ephemeral Nodes - can’t have children and expire when the creating process dies ○ TTL Nodes - automatic removal upon timer expiration ● Used for Service Discovery, Metadata Management, Synchronization, Leader Election ○ e.g. Hadoop/Yarn and Kafka, many more https://zookeeper.apache.org/doc/r3.6.2/zookeeperUseCases.html ○ Using a barrier to synchronize distributed processes (fork/join model) ■ use node b1/ as barrier, each process adds a child node in b1, of kind b1/p1, b1/p2, and so on ■ when enough processes have created their nodes (as provided to the barrier), the join can start ■ example can be generalized to other synch. models, e.g. locks, 2-phase Commit, Leader election ○ Producer-consumer model simply adding child nodes (using sequential ID) in the ZNode queue
  • 11. Apache Zookeeper: Example Apache Kafka Cluster git clone https://github.com/simplesteph/kafka-stack-docker-compose.gi t docker-compose -f zk-single-kafka-single.yml up -d docker-compose -f zk-multiple-kafka-multiple.yml up -d ZooNavigator docker run -d --network host -e HTTP_PORT=9000 --name zoonavigator --restart unless-stopped elkozmon/zoonavigator:latest
  • 12. Apache Zookeeper: Example $ kaf topic create example_topic_folks ✅ Created topic! Topic Name: example_topic_folks Partitions: 1 Replication Factor: 1 Cleanup Policy: delete
  • 13. ETCD ● Very similar to Zookeeper ○ file system-like structure with similar functionalities for Nodes (e.g. event-reaction on changes, TTL) ● Raft protocol to distribute replicas across ETCD cluster ● Either addressed via REST/JSON (e.g. by UIs) or gRPC (by other services) - See API v2, v3 ● Very lightweight and performant (written in Go) - core component of Kubernetes