[164] pinpoint

Pinpoint
대규모 분산환경 APM
강운덕
네이버 서비스플랫폼개발센터

Contents
1. 분산환경의 문제점
2. Pinpoint의 특징과 기술
- CallStack Trace
- Distributed Transaction Trace
3. 분산환경의 트러블슈팅
4. RPC Timeline Pattern
5. 신기능 & 발전방향

1.
분산환경의 문제점
가이드페이지-섹션구분

문제점
상황
수십 수백대의 서버
많은 소프트웨어 모듈
복잡하게 연동된 서비스
문제
어떻게 연동되고 있는지 파악안됨
다른 서비스에 의해 장애가 발생
개별 서버의 모니터링으로는 전체상황의 파악이 안됨

분산되면 더 어려운 이유
Network 의존성
관찰이 매우 어려움
기존방식으로는 문제를 잘 파악하기 힘듬
Logging
Single WAS Monitoring
GC Log, Heap Dump, Thread Dump
System Monitoring

복잡한 시스템의 성능 문제
해외
Proxy
API
GATEWAY
Service
DB,
CACHE,
RPC
CACHE

아무튼 느린 Request 추적해 봅시다
Tomcat에 느린 Log를 찍어 봅시다
HttpClient 호출 Log도 찍어 봅시다
Apache Access Log도 찍어 봅시다

아무튼 느린 Request 추적해 봅시다
때려쳐~ 때려쳐~

분산 아키텍쳐의 현실
통짜

Pinpoint
대규모 분산 시스템의 성능정보 수집과 문제 분석을 위한 APM 도구
- APM (Application Performance Management)
분산 트랜잭션 추적
애플리케이션 토폴로지 자동 발견 & 가시화
수평확장성
코드수준의 가시성
코드를 수정하지 않고 성능정보 수집
http://github.com/naver/pinpoint

무엇이 가능해졌는가?
과거에는 발견하지도 못했던 문제를 발견가능해짐
문제를 쉽게 빠르게 해결
문제 진단과 수정시간이 대폭 단축

Architecture
Collector
Host 1 Host 2 Host n
Web
Host JVM
JVM Option JavaAgent 명시
프로파일링 머신 대상군
Host Application
Java Agent
…
HBase
HBase
HBase
Send Profile Data
UDP/TCP (Thrift)
HBase Write
HBase Read

TomcatA
TomcatC
TomcatD
TomcatB
TomcatF
Mysql1
Cubrid
Mysql2
Cache
원격지 주소…
원격지 주소…
TomcatA
TomcatA

Distributed Transaction Trace
HttpClient.execute()

Tomcat.receive();
Tomcat.receive();

TOMCAT A
TOMCAT B

Pinpoint의 핵심기능
CallStack Trace

CallStack Trace
AMethod(); -> BMethod(); -> CMethod();
ClassABC

CallStack Trace
AMethod() {
BMethod() {
CMethod() {
}
}
}

CallStack Trace
AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack
CMethod();
BMethod();
AMethod();

CallStack Trace
JVM Classloader
public void AMethod() {
BMethod();
}
public void AMethod() {
BMethod();
}
AInterceptor.before();
AInterceptor.after();
Class Loading시점에 Code를
가로채 bytecode를 변경
Pinpoint
Agent

AMethod() {
AIntercetor.before()
BMethod() {
BInterceptor.before()
CMethod() {
CInterceptor.before();
CInterceptor.after();
}
BInterceptor.after ()
}
AIntercetor.after ()
}
CallStack Trace

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
ROOT -1
New Stack &
Bind ThreadLocal

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
AMethod
Sequence:0
Frame
Pointer
Stack
Frame
Depth
ROOT -1

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
ROOT -1
PUSH
StackFrame
AMethod
Sequence:0

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
0
ROOT -1
AMethod
PUSH
StackFrame
AMethod
Sequence:0

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
1
0
ROOT -1
PUSH
StackFrame
BMethod
Sequence:1
AMethod
BMethod

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
2
1
0
ROOT -1
PUSH
StackFrame
Cmethod
Sequence:2
AMethod
BMethod
CMethod

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
2
1
0
ROOT -1
POP
StackFrame
AMethod
BMethod
CMethod

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
2
1
0
ROOT -1POP
StackFrame
AMethod
BMethod
WriteQueue
C
C

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
1
0
ROOT -1POP
StackFrame
AMethod
BMethod
WriteQueueC

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
1
0
ROOT -1POP
StackFrame
AMethod
WriteQueue
B
C B

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
0
ROOT -1POP
StackFrame
WriteQueue
A
C B A

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
ROOT -1
WriteQueueC B A
Empty
Stack

AMethod() {
BMethod() {
CMethod() {
}
}
}
CallStack Trace
Frame
Pointer
Stack
Frame
Depth
ROOT -1
WriteQueueC B AQueue
Flush
Network
Write

CallStack Trace
Web
HBase
C B A

CallStack Trace
WAS
C
B
A
Sequence
0
1
2

CallStack Trace
WAS
C
B
A 0
1
2
Depth
0
1
2
Sequence

CallStack Trace
WAS
C
B
A 0
1
2
Depth
0
1
1
Sequence

RPC간의 관계를 찾는 방법
Request안에 추적 Tag를 포함시킨다
Http : HttpHeader

TraceId
- TransactionID
- SpanID
- pSpanID
Node 1 Node 2
Node 3
Node 4
TxId:Node1^Time^1
SpanId =1
pSpanId = -1
TxId:Node1^Time^1
SpanId = 3
pSpanId = 2
TxId:Node1^Time^1
SpanId = 4
pSpanId = 2
RPC 1
RPC 2
RPC 3
TxId:Node1^Time^1
SpanId = 2
pSpanId =1

TransactionID : GUID로 전체 메시지 아이디
각 노드마다 동일한 ID가 할당
TxId:Node1^Time^1
SpanId = 4
pSpanId = 2
TxId:Node1^Time^1
SpanId = 3
pSpanId = 2
Node 1 Node 2
Node 3
Node 4
TxId:Node1^Time^1
SpanId =1
pSpanId = -1
RPC 1
RPC 2
RPC 3
TxId:Node1^Time^1
SpanId = 2
pSpanId =1

SpanID, pSpanID : 부모 자식관계 정렬을 위한 ID
Node 1 Node 2
Node 3
Node 4
TxId:Node1^Time^1
SpanId =1
pSpanId = -1
TxId:Node1^Time^1
SpanId = 3
pSpanId = 2
TxId:Node1^Time^1
SpanId = 4
pSpanId = 2
RPC 1
RPC 2
RPC 3
TxId:Node1^Time^1
SpanId = 2
pSpanId =1

TomcatA
@Controller
public class TestController {
@RequestMapping("/test")
@ResponseBody
public String test() throws IOException {
HttpGet get = new HttpGet("http://TomcatB/hello");
HttpResponse response = httpClient.execute(get);
return EntityUtils.toString(response.getEntity());
}
}
TomcatB
@Controller
public class HelloController {
@RequestMapping("/hello")
@ResponseBody
public String hello() {
return "world!";
}
}

TomcatA
@Controller
@ResponseBody
}
}
TomcatB
@Controller
@ResponseBody
return "world!";
}
}
TraceId 생성
TRANSACTION_ID : TomcatA^시작시간^1
SPAN_ID : 10
PARENT_SPAN_ID : -1

Spring Controller Method 정보 레코딩
TomcatA
@Controller
@ResponseBody
}
}
TomcatB
@Controller
@ResponseBody
return "world!";
}
}

HttpClient 호출을 가로채 Next TtraceId 를 저장
SPAN_ID : 20 (신규발급)
PARENT_SPAN_ID : 10 (부모의 SpanId 10)
TomcatA
@Controller
@ResponseBody
}
}
TomcatB
@Controller
@ResponseBody
return "world!";
}
}

TomcatA
@Controller
@ResponseBody
}
}
TomcatB
@Controller
@ResponseBody
return "world!";
}
}
Tag
Request

TomcatA
@Controller
@ResponseBody
}
}
TomcatB
@Controller
@ResponseBody
return "world!";
}
}
Tag
Request
TomcatB는 Header에서 TraceId를 인식하여 Child로 동작
SPAN_ID : 20 (신규발급)
PARENT_SPAN_ID : 10 (부모의 SpanId 10)

TomcatA
@Controller
@ResponseBody
}
}
TomcatB
@Controller
@ResponseBody
return "world!";
}
}
HBase
RowKey
TomcatA^시작시간^1
20
10
Hello() 호출정보
TraceData
Collector

TomcatA
@Controller
@ResponseBody
}
}
TomcatB
@Controller
@ResponseBody
return "world!";
}
}
HBase
RowKey
20 10
10
-1
Test() 호출정보
Collector
TraceData

HBase
RowKey
20 10
10
-1
Test() 호출정보
WEB

3.
분산환경의
TroubleShooting

Pinpoint가 없었던 시절
연동 시스템에 장애가 발생한다면…

Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
…
Caused by: ◂◊╩◌♪♦♂◘◦▸╫╛╟╤❶╦╧[afg00101101aj..
…
Caused by: …

Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
…
Caused by: ◂◊╩◌♪♦♂◘◦▸╫╛╟╤❶╦╧[aDgfRhaj..
…
Caused by: …

어려운 문제
해외
Proxy
API
GATEWAY
Service
DB,
CACHE,
RPC
CACHE

전체 아키텍쳐 가시화
APIGATEWAY1
호주Proxy
미국Proxy
Mobile-app
Server-WEB
APIGATEWAY2

일본Proxy
호주Proxy
미국Proxy
유럽Proxy
남미Proxy
APIGW-1
APIGW-2
Service
DB
RPC
RPC…
Cache

일본Proxy
미국Proxy
유럽Proxy
남미Proxy
APIGW-1
APIGW-2
Service
DB
RPC
RPC…
Cache
호주Proxy

개별 Request 흐름 가시화
미국Proxy

해외 Proxy APIGateway Service
RPC-A
RPC-B
RPC-C
MySql
RPC-D

해외 Proxy
6초
APIGateway
6초

APIGateway
6초
Service
응답시간
6초
…

MySql
RPC-A
RPC-B
RPC-C
RPC-D

RPC-A
RPC-B
RPC-C
RPC-D
MySql

XXX YYY
ABC
http://A.naver
http://B.naver

APIGateway
해외 Proxy
Service

해외Proxy APIGateway Service
RPC-A
RPC-B
RPC-C
RPC-D
MySql

RPC Timeline Pattern
Rpc Timeline, CallStack의 시간 분포 패턴

TCP connect가 지연된 상황
Socket Option : ConnectTimeout , Socket Backlog
WebServer : Apache, Nginx
Network Switch : LoadBalancer(L4)
Client 특성 : HttpClient의 내부 retry 로직
RPC Timeline Pattern 1
TCP 연결에 문제가 있는 패턴
Client execute
Server

Network이 느린 경우
Client execute
Server
해외서버에 서버가 존재하는경우
Network 트래픽, 서버의 위치 점검
HTTP KeepAlive, HTTP2 활용
Gzip과 같은 압축활용

Client execute
Server
TargetServer의 처리가 느림
Client의 전면 장애로 파급될 가능성이 있음
Socket Timeout
Circuit breaker : Netflix Hystrix
TargetServer가 느림

Client execute
Server
Response 를 받은 후 Stream에서 데이터를 추가로 읽는 경우
- 대용량 파일 다운로드
보통 정상상태
이 상황이 문제를 유발한다면, 별도 서버 구축이 필요
응답데이터가 많음

1.5 신기능
Plugin System
사용자가 필요한 API의 정보 수집이 가능
Google Gson Plugin
- com.navercorp.pinpoint.plugin.gson.GsonPlugin

1.5 신기능
Real Time 강화
Was ActiveThread Monitoring

발전방향
예측, 제안, 패턴 분석
JVM 메모리가 OOM 패턴이라면 -> 경고
동일한 WAS의 응답시간 패턴이 다르다면 -> 경고

발전방향
예측, 제안, 패턴 분석
문제가 있는 Lib를 쓰고 있다면 -> 버전업 제안
JVM Version, Option 이 바람직하지 않다면 -> 권장 설정 제안

발전방향
Java가 아닌 구간도 프로파일링
WebServer구간의 성능 수집
- Apache, Nginx

[164] pinpoint

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to [164] pinpoint

Similar to [164] pinpoint (20)

More from NAVER D2

More from NAVER D2 (20)

[164] pinpoint