SlideShare a Scribd company logo
SRE vs DevOps
Feel the difference
1
Levon Avakyan / Competetive
Gaming /
l_avakyan@wargaming.net
Content 2
• Definitions – to be one page
• SRE vs DevOps – little bit of phylosophy
• Approach – how to do well
• Cases – how we are doing in Competitive
Gaming
What I will speak about
3
Definitions
To be on one page
Reliability 4
Little bit of the theory
Reliability is theoretically defined as the probability of
success (𝑹𝒆𝒍𝒊𝒂𝒃𝒊𝒍𝒊𝒕𝒚 = 𝟏 − 𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐅𝐚𝐢𝐥𝐮𝐫𝐞), as the
frequency of failures; or in terms of availability, as a
probability derived from reliability, testability and
maintainability. Reliability plays a key role in the cost-
effectiveness of systems.
Reliability Engineering 5
Little bit of the theory
• Reliability engineering is engineering that
emphasizes dependability in the lifecycle
management of a product.
• Reliability engineering deals with the estimation,
prevention and management of high levels of
"lifetime" engineering uncertainty and risks of
failure.
Software Reliability 6
Little bit of the theory
• Software Reliability (SR) depends on good
requirements, design and implementation. Software
reliability engineering relies heavily on a disciplined
software engineering process to anticipate and
design against unintended consequences.
Site reliability engineering 7
Little bit of the theory
Site reliability engineering (SRE) is a discipline that
incorporates aspects of software engineering and
applies that to operations whose goals are to create
ultra-scalable and highly-reliable software systems.
SRE might be considered a subset of Devops that
possesses additional skill sets.
Development Operations 8
Little bit of the theory
DevOps is a term used to refer to a set of practices that
emphasize the collaboration and communication of
both software developers and information technology
(IT) professionals while automating the process of
software delivery and infrastructure changes. It aims at
establishing a culture and environment where building,
testing, and releasing software can happen rapidly,
frequently, and more reliably
9
SRE VS DevOps
Little bit of philosofy
10
Site Reliability Engineering
• Main focus on to creation ultra-
scalable and highly reliable
software systems.
• It is a one of engineering
specializations
• Fully embedded in the lifecycle of
product
Development Operations
• Main focus on automated
deployment process on
production and staging
environments
• It is a role
• Mostly working with environments
SRE (SR) vs DevOps
Comprasion
SRE (SR) vs DevOps 11
Conclusion
• SRE (SR) is a broader concept than DevOps
• We cannot put versus between SRE (SR) and Devops
because they achieves the similar goals, but with
different approaches
12
Approach
How to do well
Product lifecycle 13
Paste one content item here. Field is obligatory to complete.
Pre-production 14
Main purpose:
• Create specification for Development
• Clarify with business all details
Main artefacts are requirements and high level design (HLD) of new
feature/product
SRE Role:
• Review and clarify HLD
• Adding specifically requirements to improve reliability and
reduce impact to players in case of failures
Development 15
Main purpose:
• To develop the application
• To test the application
Main artefacts are release tag, SDD, test suites,
regulations/automation for release
SRE Role:
• Review and clarify SDD
• Monitoring design
• Load and performance test (tooling, environments)
• Stress tests
• Release preparations (tooling, massive migrations, release time
estimation)
Release 16
Main purpose:
• Check that application is ready to go production
• To deliver application to production environment
Main artefacts are released application and release postmortem
SRE Role:
• Review regulations
• Automatize process with standard tools
Post-Release 17
Main purpose:
• Monitoring
• Maintains
• Mitigating risks and decrease impact for user in case of outgages
Main artefacts are bugs and improvments for dev team and data for
product management team to analyze it
SRE Role:
• L2+-L3 maintains
• Data collection tools
Conclusion 18
• SRE is embedded in all life cycle of life
product
• Main aim of SRE it is increase reliability
• The scope of the responsibilities is very
variable and depends on company layout
19
Сases
How we are doing in Competitive
Gaming
Cases 20
• World of Tanks football tournament
• Companies on WoT Global Map
World of Tanks Football Tournament 21
Features:
• Cross project product
• Great importance for players and company
• New battle type
Architecture 22
Wotld of Tanks Football Tournament
Risks 23
World of Tanks Football Tournament
• High load
• A very long route for battle - a lot of points of outage
• First big load for Team Management System
• A lot of separated teams are working on event
What we have done 24
World of Tanks Football Tournament
• Did end to end load and performance test of system
• Got the prediction of players count from publisher
• Based on numbers create recommendation for the
schedule
• Added safe day in schedule
• Created tooling to move groups, steps, battels of
tournament to the other date
• Isolated battle processing and API
• Created auto scale configuration for workers
Global Map 25
Global Map
Features:
• Potentially increasing battle counts to proccess
• Have no chance to fault because it will influence to
the results of 3-week event
Architecture 26
Global Map
Risks 27
Global Map
• High load
• New gameplay features
• New vector tiles engines
• No chances to move battles
What we have done 28
Global Map
• Massive load test of new tiles vector engine
• Additional monitoring that based on game logic
• Added requirements to have opportunity to scale
most of workers
Conclusion 29
• SRE (SR) is a broader concept than DevOps
• We cannot put versus between SRE (SR) and Devops
because they achieves the similar goals, but with
different approaches
• SRE is embedded in all life cycle of life product
• Main aim of SRE it is increase reliability
• The scope of the responsibilities is very variable and
depends on company layout
Thank you
30
Levon Avakyan/Competitive Gaming Reliability Team Lead/l_avakyan@wargaming.net

More Related Content

SRE vs DevOps

  • 1. SRE vs DevOps Feel the difference 1 Levon Avakyan / Competetive Gaming / l_avakyan@wargaming.net
  • 2. Content 2 • Definitions – to be one page • SRE vs DevOps – little bit of phylosophy • Approach – how to do well • Cases – how we are doing in Competitive Gaming What I will speak about
  • 4. Reliability 4 Little bit of the theory Reliability is theoretically defined as the probability of success (𝑹𝒆𝒍𝒊𝒂𝒃𝒊𝒍𝒊𝒕𝒚 = 𝟏 − 𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐅𝐚𝐢𝐥𝐮𝐫𝐞), as the frequency of failures; or in terms of availability, as a probability derived from reliability, testability and maintainability. Reliability plays a key role in the cost- effectiveness of systems.
  • 5. Reliability Engineering 5 Little bit of the theory • Reliability engineering is engineering that emphasizes dependability in the lifecycle management of a product. • Reliability engineering deals with the estimation, prevention and management of high levels of "lifetime" engineering uncertainty and risks of failure.
  • 6. Software Reliability 6 Little bit of the theory • Software Reliability (SR) depends on good requirements, design and implementation. Software reliability engineering relies heavily on a disciplined software engineering process to anticipate and design against unintended consequences.
  • 7. Site reliability engineering 7 Little bit of the theory Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations whose goals are to create ultra-scalable and highly-reliable software systems. SRE might be considered a subset of Devops that possesses additional skill sets.
  • 8. Development Operations 8 Little bit of the theory DevOps is a term used to refer to a set of practices that emphasize the collaboration and communication of both software developers and information technology (IT) professionals while automating the process of software delivery and infrastructure changes. It aims at establishing a culture and environment where building, testing, and releasing software can happen rapidly, frequently, and more reliably
  • 9. 9 SRE VS DevOps Little bit of philosofy
  • 10. 10 Site Reliability Engineering • Main focus on to creation ultra- scalable and highly reliable software systems. • It is a one of engineering specializations • Fully embedded in the lifecycle of product Development Operations • Main focus on automated deployment process on production and staging environments • It is a role • Mostly working with environments SRE (SR) vs DevOps Comprasion
  • 11. SRE (SR) vs DevOps 11 Conclusion • SRE (SR) is a broader concept than DevOps • We cannot put versus between SRE (SR) and Devops because they achieves the similar goals, but with different approaches
  • 13. Product lifecycle 13 Paste one content item here. Field is obligatory to complete.
  • 14. Pre-production 14 Main purpose: • Create specification for Development • Clarify with business all details Main artefacts are requirements and high level design (HLD) of new feature/product SRE Role: • Review and clarify HLD • Adding specifically requirements to improve reliability and reduce impact to players in case of failures
  • 15. Development 15 Main purpose: • To develop the application • To test the application Main artefacts are release tag, SDD, test suites, regulations/automation for release SRE Role: • Review and clarify SDD • Monitoring design • Load and performance test (tooling, environments) • Stress tests • Release preparations (tooling, massive migrations, release time estimation)
  • 16. Release 16 Main purpose: • Check that application is ready to go production • To deliver application to production environment Main artefacts are released application and release postmortem SRE Role: • Review regulations • Automatize process with standard tools
  • 17. Post-Release 17 Main purpose: • Monitoring • Maintains • Mitigating risks and decrease impact for user in case of outgages Main artefacts are bugs and improvments for dev team and data for product management team to analyze it SRE Role: • L2+-L3 maintains • Data collection tools
  • 18. Conclusion 18 • SRE is embedded in all life cycle of life product • Main aim of SRE it is increase reliability • The scope of the responsibilities is very variable and depends on company layout
  • 19. 19 Сases How we are doing in Competitive Gaming
  • 20. Cases 20 • World of Tanks football tournament • Companies on WoT Global Map
  • 21. World of Tanks Football Tournament 21 Features: • Cross project product • Great importance for players and company • New battle type
  • 22. Architecture 22 Wotld of Tanks Football Tournament
  • 23. Risks 23 World of Tanks Football Tournament • High load • A very long route for battle - a lot of points of outage • First big load for Team Management System • A lot of separated teams are working on event
  • 24. What we have done 24 World of Tanks Football Tournament • Did end to end load and performance test of system • Got the prediction of players count from publisher • Based on numbers create recommendation for the schedule • Added safe day in schedule • Created tooling to move groups, steps, battels of tournament to the other date • Isolated battle processing and API • Created auto scale configuration for workers
  • 25. Global Map 25 Global Map Features: • Potentially increasing battle counts to proccess • Have no chance to fault because it will influence to the results of 3-week event
  • 27. Risks 27 Global Map • High load • New gameplay features • New vector tiles engines • No chances to move battles
  • 28. What we have done 28 Global Map • Massive load test of new tiles vector engine • Additional monitoring that based on game logic • Added requirements to have opportunity to scale most of workers
  • 29. Conclusion 29 • SRE (SR) is a broader concept than DevOps • We cannot put versus between SRE (SR) and Devops because they achieves the similar goals, but with different approaches • SRE is embedded in all life cycle of life product • Main aim of SRE it is increase reliability • The scope of the responsibilities is very variable and depends on company layout
  • 30. Thank you 30 Levon Avakyan/Competitive Gaming Reliability Team Lead/l_avakyan@wargaming.net

Editor's Notes

  1. Надежность может теоритический определятся как вероятность успеха, то есть надежность = 1 – вероятность отказа, частотой отказов с другой стороны в терминах доступности как вероятность полученная из надежности, тестируемости и ремонтопригодности. Надежность играет ключевую роль в экономической эффективности систем.
  2. Reliability engineering является разработка, которая подчеркивает надежность в управлении жизненным циклом продукта. Reliability engineering касается оценки, предотвращения и управления высокими уровнями «пожизненной» инженерной неопределенности и рисков отказа.
  3. SR зависит от корректных требований, архитектуры и реализации. SR программного обеспечения в значительной степени зависит от процесса разработки ПО, чтобы предугадывать и проектировать его , чтобы противостоять непредвиденным последствиям.
  4. SRE- это дисциплина, которая включает аспекты разработки программного обеспечения и применяется к операциям, целью которых является создание ультрамасштабируемых и высоконадежных программных систем. SRE можно рассматривать как подмножество Devops, обладающее дополнительными наборами навыков.
  5. DevOps - термин, используемый для обозначения набора практических методов, которые подчеркивают сотрудничество и коммуникацию как разработчиков программного обеспечения, так и специалистов в области информационных технологий (ИТ), в то же время автоматизируя процесс доставки программного обеспечения и изменения инфраструктуры. Он нацелен на создание культуры и среды, где создание, тестирование и выпуск программного обеспечения могут происходить быстро, часто и надежно