2022년 11월 30일 코엑스에�� 개최한 베스트콘2022(Better Software Testing Conference 2022)에서 발표한 강연 자료입니다.
대규모 장애를 막기 위해 소프트웨어/품질 엔지니어가 알아야 할 내결함성의 개념과 설계 기법을 공유드립니다.
생생한 강연 영상으로 확인해 보세요!
https://youtu.be/OLsv7oG0VPo
NDC18에서 발표하였습니다. 현재 보고 계신 슬라이드는 1부 입니다.(총 2부)
- 1부 링크: https://goo.gl/3v4DAa
- 2부 링크: https://goo.gl/wpoZpY
(SlideShare에 슬라이드 300장 제한으로 2부로 나누어 올렸습니다. 불편하시더라도 양해 부탁드립니다.)
1. The document discusses RESTful APIs and gRPC, comparing their characteristics and use cases.
2. RESTful APIs typically use HTTP and JSON to access resources via URLs while gRPC uses protocol buffers and HTTP/2 for efficient streaming and RPC.
3. gRPC is better suited for microservices and mobile apps due to its ability to handle streaming and performance, while REST is more widely used due to its simplicity and support in most languages.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
The document summarizes the new plugin API in Fluentd v0.14. Key points include:
- The v0.12 plugin API was fragmented and difficult to write tests for. The v0.14 API provides a unified architecture.
- The main plugin classes are Input, Filter, Output, Buffer, and plugins must subclass Fluent::Plugin::Base.
- The Output plugin supports both buffered and non-buffered processing. Buffering can be configured by tags, time, or custom fields.
- "Owned" plugins like Buffer are instantiated by primary plugins and can access owner resources. Storage is a new owned plugin for persistent storage.
- New test drivers emulate plugin
Apache Kafak의 빅데이터 아키텍처에서 역할이 점차 커지고, 중요한 비중을 차지하게 되면서, 성능에 대한 고민도 늘어나고 있다.
다양한 프로젝트를 진행하면서 Apache Kafka를 모니터링 하기 위해 필요한 Metrics들을 이해하고, 이를 최적화 하기 위한 Configruation 설정을 정리해 보았다.
[Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안]
Apache Kafka 성능 모니터링에 필요한 metrics에 대해 이해하고, 4가지 관점(처리량, 지연, Durability, 가용성)에서 성능을 최적화 하는 방안을 정리함. Kafka를 구성하는 3개 모듈(Producer, Broker, Consumer)별로 성능 최적화를 위한 …
[Apache Kafka 모니터링을 위한 Metrics 이해]
Apache Kafka의 상태를 모니터링 하기 위해서는 4개(System(OS), Producer, Broker, Consumer)에서 발생하는 metrics들을 살펴봐야 한다.
이번 글에서는 JVM에서 제공하는 JMX metrics를 중심으로 producer/broker/consumer의 지표를 정리하였다.
모든 지표를 정리하진 않았고, 내 관점에서 유의미한 지표들을 중심으로 이해한 내용임
[Apache Kafka 성능 Configuration 최적화]
성능목표를 4개로 구분(Throughtput, Latency, Durability, Avalibility)하고, 각 목표에 따라 어떤 Kafka configuration의 조정을 어떻게 해야하는지 정리하였다.
튜닝한 파라미터를 적용한 후, 성능테스트를 수행하면서 추출된 Metrics를 모니터링하여 현재 업무에 최적화 되도록 최적화를 수행하는 것이 필요하다.
REEF is a meta-framework for big data analytics that eases development atop resource managers like YARN and Mesos. It provides a reusable control plane for coordinating data processing tasks and an adaptation layer for different resource managers. REEF decouples applications from cluster resources and handles common control plane functions like fault tolerance and configuration management. The framework is implemented in Java and C# and supports local, YARN, Mesos, and HDInsight execution environments. Future work includes graduating REEF from the Apache Incubator and using it to build new data processing frameworks and systems.
The document describes how to build a data science team and systems. It discusses establishing data collection and management systems, developing metrics and dashboards to analyze business data, creating predictive models using machine learning algorithms, and providing data science services like information retrieval to internal customers. The goal is to move from static, uncollected data to a fully realized big data platform and data science team that supports business analytics and decision making.
This document summarizes lessons learned from developing the Realm Android library. It discusses challenges such as setting up an Android library project, API design, testing, distribution methods, and issues like annotation processing, bytecode weaving, and native code support. Key points covered are how to start a library project, the importance of testing libraries extensively, and distribution options like Bintray.
This document summarizes a presentation about Packetbeat and monitoring distributed systems. It discusses how Packetbeat passively captures network packets, decodes protocols, and matches requests and responses to create JSON objects. It then sends this data to Elasticsearch for analysis. Aggregations like histograms, percentiles, and moving averages are used to analyze latency, identify slow methods, and detect anomalies in metrics over time. Other Beats like Topbeat, Filebeat, and Metricsbeat are also briefly introduced.
MIT researchers have developed highly efficient quadruped robots like the Cheetah that can run at speeds up to 6m/s. The Cheetah uses a proprioceptive actuation system with high torque density motors to achieve high force control bandwidth over 120Hz. Its parallelized control system with multicore CPUs and FPGAs allows control frequencies up to 4kHz. Design principles for efficient legged locomotion include energy regeneration, low transmission impedance, and low leg inertia. The researchers are continuing their work with robots like Cheetah 2 and Hermes.
DRC-HUBO is Rainbow Robotics' humanoid robot that competed in the DRC Finals. It uses a modular, lightweight exoskeletal design with effective cooling and power systems. PODO-RT is the real-time framework that controls DRC-HUBO. It uses a distributed architecture with independent processes communicating over shared memory for high-speed control. DRC-HUBO demonstrated a variety of autonomous tasks at the DRC Finals, including driving, opening doors, using tools, and traversing rough terrain.
This document discusses providing immersive sound for virtual reality. It notes that sound is half the experience of immersion. While VR technology allows immersion in digital worlds, truly immersive sound requires binaural 3D audio rendering or recording. Binaural audio uses head-related transfer functions to simulate the sound reaching each ear, allowing localization of sounds in 3D space. However, interactive binaural recording and matching sounds to visual content in real-time pose technical challenges. The document demonstrates an implementation of immersive 3D binaural audio for VR.
Klaytn API Service
Klaytn 플랫폼에서 BApp을 개발하기 위해서는 서비스 개발 회사들이 직접 Klaytn Node를 운영하는 것을 요구하게 됩니다. 이는 서비스 본질에 집중하고 싶은 개발사의 목적과 부합하지 않고 다소 전문적인 인력/기술이 필요하다는 문제가 발생할 수 있습니다. 이러한 문제점을 해결하기 위해 Ground X가 개발중인 Klaytn API Service에 대해 소개합니다.
리눅스 pacemaker 기반의 High Availaiblity 구성방법에 대해 설명합니다. pacemaker를 사용하는 다른 리눅스 기반도 구성이 가능합니다.
Pacemaker 기반 Linux High Availability 입문용으로는 적합하지 않을 수 있습니다. Pacemaker 기반 Linux High Availability를 한 번도 설치 및 구성을 하지 않은 리눅스 관리자라면 설치 문서를 먼저 참고하십시오.
RHEL7 및 CentOS 7을 중심으로 레드햇 계열의 리눅스에 적합한 내용으로 작성되었습니다.
인프라 모니터링을 위한 시스템을 구축하고 운영하는 데 있어, 다이내믹한 인프라 변화는 어려움으로 다가오고 있습니다.
본 세션에서는 인프라를 운영하는 팀 혹은 운영자 관점에서 바라본 미래 지향적 인프라 모니터링 시스템의 방향성과 이를 구현하기 위해 필요한 구성들을 공유하고자 합니다.
목차
1. NHN 모니터링의 현재
2. 모니터링의 변화
3. 모니터링 방법론
4. 모니터링 절차
5. NHN 모니터링의 미래
대상
- 인프라를 운영하는 시스템 엔지니어
- 인프라 모니터링 시스템에 관심이 있는 분
The document discusses various machine learning clustering algorithms like K-means clustering, DBSCAN, and EM clustering. It also discusses neural network architectures like LSTM, bi-LSTM, and convolutional neural networks. Finally, it presents results from evaluating different chatbot models on various metrics like validation score.
The document discusses challenges with using reinforcement learning for robotics. While simulations allow fast training of agents, there is often a "reality gap" when transferring learning to real robots. Other approaches like imitation learning and self-supervised learning can be safer alternatives that don't require trial-and-error. To better apply reinforcement learning, robots may need model-based approaches that learn forward models of the world, as well as techniques like active localization that allow robots to gather targeted information through interactive perception. Closing the reality gap will require finding ways to better match simulations to reality or allow robots to learn from real-world experiences.
[243] Deep Learning to help student’s Deep Learning
This document describes research on using deep learning to predict student performance in massive open online courses (MOOCs). It introduces GritNet, a model that takes raw student activity data as input and predicts outcomes like course graduation without feature engineering. GritNet outperforms baselines by more than 5% in predicting graduation. The document also describes how GritNet can be adapted in an unsupervised way to new courses using pseudo-labels, improving predictions in the first few weeks. Overall, GritNet is presented as the state-of-the-art for student prediction and can be transferred across courses without labels.
[234]Fast & Accurate Data Annotation Pipeline for AI applications
This document provides a summary of new datasets and papers related to computer vision tasks including object detection, image matting, person pose estimation, pedestrian detection, and person instance segmentation. A total of 8 papers and their associated datasets are listed with brief descriptions of the core contributions or techniques developed in each.
This document presents a formula for calculating the loss function J(θ) in machine learning models. The formula averages the negative log likelihood of the predicted probabilities being correct over all samples S, and includes a regularization term λ that penalizes predicted embeddings being dissimilar from actual embeddings. It also defines the cosine similarity term used in the regularization.
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
The document discusses running a TensorFlow Serving (TFS) container using Docker. It shows commands to:
1. Pull the TFS Docker image from a repository
2. Define a script to configure and run the TFS container, specifying the model path, name, and port mapping
3. Run the script to start the TFS container exposing port 13377
The document discusses linear algebra concepts including:
- Representing a system of linear equations as a matrix equation Ax = b where A is a coefficient matrix, x is a vector of unknowns, and b is a vector of constants.
- Solving for the vector x that satisfies the matrix equation using linear algebra techniques such as row reduction.
- Examples of matrix equations and their component vectors are shown.
This document describes the steps to convert a TensorFlow model to a TensorRT engine for inference. It includes steps to parse the model, optimize it, generate a runtime engine, serialize and deserialize the engine, as well as perform inference using the engine. It also provides code snippets for a PReLU plugin implementation in C++.
The document discusses machine reading comprehension (MRC) techniques for question answering (QA) systems, comparing search-based and natural language processing (NLP)-based approaches. It covers key milestones in the development of extractive QA models using NLP, from early sentence-level models to current state-of-the-art techniques like cross-attention, self-attention, and transfer learning. It notes the speed and scalability benefits of combining search and reading methods for QA.
Secrets of Performance Tuning Java on KubernetesBruno Borges
Java on Kubernetes may seem complicated, but after a bit of YAML and Dockerfiles, you will wonder what all that fuss was. But then the performance of your app in 1 CPU/1 GB of RAM makes you wonder. Learn how JVM ergonomics, CPU throttling, and GCs can help increase performance while reducing costs.
2022년 11월 30일 코엑스에서 개최한 베스트콘2022(Better Software Testing Conference 2022)에서 발표한 강연 자료입니다.
대규모 장애를 막기 위해 소프트웨어/품질 엔지니어가 알아야 할 내결함성의 개념과 설계 기법을 공유드립니다.
생생한 강연 영상으로 확인해 보세요!
https://youtu.be/OLsv7oG0VPo
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유Hyojun Jeon
NDC18에서 발표하였습니다. 현재 보고 계신 슬라이드는 1부 입니다.(총 2부)
- 1부 링크: https://goo.gl/3v4DAa
- 2부 링크: https://goo.gl/wpoZpY
(SlideShare에 슬라이드 300장 제한으로 2부로 나누어 올렸습니다. 불편하시더라도 양해 부탁드립니다.)
1. The document discusses RESTful APIs and gRPC, comparing their characteristics and use cases.
2. RESTful APIs typically use HTTP and JSON to access resources via URLs while gRPC uses protocol buffers and HTTP/2 for efficient streaming and RPC.
3. gRPC is better suited for microservices and mobile apps due to its ability to handle streaming and performance, while REST is more widely used due to its simplicity and support in most languages.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
The document summarizes the new plugin API in Fluentd v0.14. Key points include:
- The v0.12 plugin API was fragmented and difficult to write tests for. The v0.14 API provides a unified architecture.
- The main plugin classes are Input, Filter, Output, Buffer, and plugins must subclass Fluent::Plugin::Base.
- The Output plugin supports both buffered and non-buffered processing. Buffering can be configured by tags, time, or custom fields.
- "Owned" plugins like Buffer are instantiated by primary plugins and can access owner resources. Storage is a new owned plugin for persistent storage.
- New test drivers emulate plugin
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
Apache Kafak의 빅데이터 아키텍처에서 역할이 점차 커지고, 중요한 비중을 차지하게 되면서, 성능에 대한 고민도 늘어나고 있다.
다양한 프로젝트를 진행하면서 Apache Kafka를 모니터링 하기 위해 필요한 Metrics들을 이해하고, 이를 최적화 하기 위한 Configruation 설정을 정리해 보았다.
[Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안]
Apache Kafka 성능 모니터링에 필요한 metrics에 대해 이해하고, 4가지 관점(처리량, 지연, Durability, 가용성)에서 성능을 최적화 하는 방안을 정리함. Kafka를 구성하는 3개 모듈(Producer, Broker, Consumer)별로 성능 최적화를 위한 …
[Apache Kafka 모니터링을 위한 Metrics 이해]
Apache Kafka의 상태를 모니터링 하기 위해서는 4개(System(OS), Producer, Broker, Consumer)에서 발생하는 metrics들을 살펴봐야 한다.
이번 글에서는 JVM에서 제공하는 JMX metrics를 중심으로 producer/broker/consumer의 지표를 정리하였다.
모든 지표를 정리하진 않았고, 내 관점에서 유의미한 지표들을 중심으로 이해한 내용임
[Apache Kafka 성능 Configuration 최적화]
성능목표를 4개로 구분(Throughtput, Latency, Durability, Avalibility)하고, 각 목표에 따라 어떤 Kafka configuration의 조정을 어떻게 해야하는지 정리하였다.
튜닝한 파라미터를 적용한 후, 성능테스트를 수행하면서 추출된 Metrics를 모니터링하여 현재 업무에 최적화 되도록 최적화를 수행하는 것이 필요하다.
REEF is a meta-framework for big data analytics that eases development atop resource managers like YARN and Mesos. It provides a reusable control plane for coordinating data processing tasks and an adaptation layer for different resource managers. REEF decouples applications from cluster resources and handles common control plane functions like fault tolerance and configuration management. The framework is implemented in Java and C# and supports local, YARN, Mesos, and HDInsight execution environments. Future work includes graduating REEF from the Apache Incubator and using it to build new data processing frameworks and systems.
The document describes how to build a data science team and systems. It discusses establishing data collection and management systems, developing metrics and dashboards to analyze business data, creating predictive models using machine learning algorithms, and providing data science services like information retrieval to internal customers. The goal is to move from static, uncollected data to a fully realized big data platform and data science team that supports business analytics and decision making.
This document summarizes lessons learned from developing the Realm Android library. It discusses challenges such as setting up an Android library project, API design, testing, distribution methods, and issues like annotation processing, bytecode weaving, and native code support. Key points covered are how to start a library project, the importance of testing libraries extensively, and distribution options like Bintray.
This document summarizes a presentation about Packetbeat and monitoring distributed systems. It discusses how Packetbeat passively captures network packets, decodes protocols, and matches requests and responses to create JSON objects. It then sends this data to Elasticsearch for analysis. Aggregations like histograms, percentiles, and moving averages are used to analyze latency, identify slow methods, and detect anomalies in metrics over time. Other Beats like Topbeat, Filebeat, and Metricsbeat are also briefly introduced.
MIT researchers have developed highly efficient quadruped robots like the Cheetah that can run at speeds up to 6m/s. The Cheetah uses a proprioceptive actuation system with high torque density motors to achieve high force control bandwidth over 120Hz. Its parallelized control system with multicore CPUs and FPGAs allows control frequencies up to 4kHz. Design principles for efficient legged locomotion include energy regeneration, low transmission impedance, and low leg inertia. The researchers are continuing their work with robots like Cheetah 2 and Hermes.
DRC-HUBO is Rainbow Robotics' humanoid robot that competed in the DRC Finals. It uses a modular, lightweight exoskeletal design with effective cooling and power systems. PODO-RT is the real-time framework that controls DRC-HUBO. It uses a distributed architecture with independent processes communicating over shared memory for high-speed control. DRC-HUBO demonstrated a variety of autonomous tasks at the DRC Finals, including driving, opening doors, using tools, and traversing rough terrain.
This document discusses providing immersive sound for virtual reality. It notes that sound is half the experience of immersion. While VR technology allows immersion in digital worlds, truly immersive sound requires binaural 3D audio rendering or recording. Binaural audio uses head-related transfer functions to simulate the sound reaching each ear, allowing localization of sounds in 3D space. However, interactive binaural recording and matching sounds to visual content in real-time pose technical challenges. The document demonstrates an implementation of immersive 3D binaural audio for VR.
Klaytn API Service
Klaytn 플랫폼에서 BApp을 개발하기 위해서는 서비스 개발 회사들이 직접 Klaytn Node를 운영하는 것을 요구하게 됩니다. 이는 서비스 본질에 집중하고 싶은 개발사의 목적과 부합하지 않고 다소 전문적인 인력/기술이 필요하다는 문제가 발생할 수 있습니다. 이러한 문제점을 해결하기 위해 Ground X가 개발중인 Klaytn API Service에 대해 소개합니다.
리눅스 pacemaker 기반의 High Availaiblity 구성방법에 대해 설명합니다. pacemaker를 사용하는 다른 리눅스 기반도 구성이 가능합니다.
Pacemaker 기반 Linux High Availability 입문용으로는 적합하지 않을 수 있습니다. Pacemaker 기반 Linux High Availability를 한 번도 설치 및 구성을 하지 않은 리눅스 관리자라면 설치 문서를 먼저 참고하십시오.
RHEL7 및 CentOS 7을 중심으로 레드햇 계열의 리눅스에 적합한 내용으로 작성되었습니다.
인프라 모니터링을 위한 시스템을 구축하고 운영하는 데 있어, 다이내믹한 인프라 변화는 어려움으로 다가오고 있습니다.
본 세션에서는 인프라를 운영하는 팀 혹은 운영자 관점에서 바라본 미래 지향적 인프라 모니터링 시스템의 방향성과 이를 구현하기 위해 필요한 구성들을 공유하고자 합니다.
목차
1. NHN 모니터링의 현재
2. 모니터링의 변화
3. 모니터링 방법론
4. 모니터링 절차
5. NHN 모니터링의 미래
대상
- 인프라를 운영하는 시스템 엔지니어
- 인프라 모니터링 시스템에 관심이 있는 분
The document discusses various machine learning clustering algorithms like K-means clustering, DBSCAN, and EM clustering. It also discusses neural network architectures like LSTM, bi-LSTM, and convolutional neural networks. Finally, it presents results from evaluating different chatbot models on various metrics like validation score.
The document discusses challenges with using reinforcement learning for robotics. While simulations allow fast training of agents, there is often a "reality gap" when transferring learning to real robots. Other approaches like imitation learning and self-supervised learning can be safer alternatives that don't require trial-and-error. To better apply reinforcement learning, robots may need model-based approaches that learn forward models of the world, as well as techniques like active localization that allow robots to gather targeted information through interactive perception. Closing the reality gap will require finding ways to better match simulations to reality or allow robots to learn from real-world experiences.
[243] Deep Learning to help student’s Deep LearningNAVER D2
This document describes research on using deep learning to predict student performance in massive open online courses (MOOCs). It introduces GritNet, a model that takes raw student activity data as input and predicts outcomes like course graduation without feature engineering. GritNet outperforms baselines by more than 5% in predicting graduation. The document also describes how GritNet can be adapted in an unsupervised way to new courses using pseudo-labels, improving predictions in the first few weeks. Overall, GritNet is presented as the state-of-the-art for student prediction and can be transferred across courses without labels.
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
This document provides a summary of new datasets and papers related to computer vision tasks including object detection, image matting, person pose estimation, pedestrian detection, and person instance segmentation. A total of 8 papers and their associated datasets are listed with brief descriptions of the core contributions or techniques developed in each.
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
This document presents a formula for calculating the loss function J(θ) in machine learning models. The formula averages the negative log likelihood of the predicted probabilities being correct over all samples S, and includes a regularization term λ that penalizes predicted embeddings being dissimilar from actual embeddings. It also defines the cosine similarity term used in the regularization.
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
The document discusses running a TensorFlow Serving (TFS) container using Docker. It shows commands to:
1. Pull the TFS Docker image from a repository
2. Define a script to configure and run the TFS container, specifying the model path, name, and port mapping
3. Run the script to start the TFS container exposing port 13377
The document discusses linear algebra concepts including:
- Representing a system of linear equations as a matrix equation Ax = b where A is a coefficient matrix, x is a vector of unknowns, and b is a vector of constants.
- Solving for the vector x that satisfies the matrix equation using linear algebra techniques such as row reduction.
- Examples of matrix equations and their component vectors are shown.
This document describes the steps to convert a TensorFlow model to a TensorRT engine for inference. It includes steps to parse the model, optimize it, generate a runtime engine, serialize and deserialize the engine, as well as perform inference using the engine. It also provides code snippets for a PReLU plugin implementation in C++.
The document discusses machine reading comprehension (MRC) techniques for question answering (QA) systems, comparing search-based and natural language processing (NLP)-based approaches. It covers key milestones in the development of extractive QA models using NLP, from early sentence-level models to current state-of-the-art techniques like cross-attention, self-attention, and transfer learning. It notes the speed and scalability benefits of combining search and reading methods for QA.
17. Pinpoint
대규모 분산 시스템의 성능정보 수집과 문제 분석을 위한 APM 도구
- APM (Application Performance Management)
분산 트랜잭션 추적
애플리케이션 토폴로지 자동 발견 & 가시화
수평확장성
코드수준의 가시성
코드를 수정하지 않고 성능정보 수집
http://github.com/naver/pinpoint
73. Pinpoint가 없었던 시절
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
…
Caused by: ◂◊╩◌♪♦♂◘◦▸╫╛╟╤❶╦╧[afg00101101aj..
…
Caused by: …
74. Pinpoint가 없었던 시절
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
…
Caused by: ◂◊╩◌♪♦♂◘◦▸╫╛╟╤❶╦╧[aDgfRhaj..
…
Caused by: …
109. TCP connect가 지연된 상황
Socket Option : ConnectTimeout , Socket Backlog
WebServer : Apache, Nginx
Network Switch : LoadBalancer(L4)
Client 특성 : HttpClient의 내부 retry 로직
RPC Timeline Pattern 1
TCP 연결에 문제가 있는 패턴
Client execute
Server
110. RPC Timeline Pattern 2
Network이 느린 경우
Client execute
Server
해외서버에 서버가 존재하는경우
Network 트래픽, 서버의 위치 점검
HTTP KeepAlive, HTTP2 활용
Gzip과 같은 압축활용
111. RPC Timeline Pattern 3
Client execute
Server
TargetServer의 처리가 느림
Client의 전면 장애로 파급될 가능성이 있음
Socket Timeout
Circuit breaker : Netflix Hystrix
TargetServer가 느림
112. RPC Timeline Pattern 4
Client execute
Server
Response 를 받은 후 Stream에서 데이터를 추가로 읽는 경우
- 대용량 파일 다운로드
보통 정상상태
이 상황이 문제를 유발한다면, 별도 서버 구축이 필요
응답데이터가 많음