SlideShare a Scribd company logo
Prometheus on EKS 가이드 문서
(https://docs.aws.amazon.com/ko_kr/eks/latest/userguide/prometheus.html)
📌QA test Region on (ap-northeast-1 / 도쿄)
https://github.com/sysnet4admin
Helm v3.9.1 설치
1.openssl 설치
[cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl -y
Loaded plugins: ovl, priorities
Resolving Dependencies
--> Running transaction check
---> Package openssl.x86_64 1:1.0.2k-24.amzn2.0.3 will be installed
--> Finished Dependency Resolution
<snipped>
Downloading packages:
openssl-1.0.2k-24.amzn2.0.3.x86_64.rpm
<snipped>
Installed:
openssl.x86_64 1:1.0.2k-24.amzn2.0.3
Complete!
2.helm binary 설치
[cloudshell-user@ip-10-0-146-72 ~]$ curl -fsSL -o get_helm.sh
https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
[cloudshell-user@ip-10-0-146-72 ~]$ chmod 700 get_helm.sh
[cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.9.1-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
3.helm을 실행 디렉토리 들로 옮김
[cloudshell-user@ip-10-0-46-136 ~]$ cp /usr/local/bin/helm
$HOME/bin/helm && export PATH=$PATH:$HOME/bin
4.설치된 helm 확인
[cloudshell-user@ip-10-0-146-72 ~]$ helm version
version.BuildInfo{Version:"v3.9.1",
GitCommit:"a7c043acb5ff905c261cfdc923a35776ba5e66e4",
GitTreeState:"clean", GoVersion:"go1.17.5"}
❗만약 openssl이 설치되어 있지 않은 경우
[cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh
In order to verify checksum, openssl must first be installed.
Please install openssl or set VERIFY_CHECKSUM=false in your environment.
Failed to install helm
For support, go to https://github.com/helm/helm.
[cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl
헬름을 통한 Prometheus 배포를 위한 사전 작업
1.프로메테우스 설치를 위한 헬름 레포를 추가
[cloudshell-user@ip-10-0-146-72 ~]$ helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories
2.레포에서 최신 내용을 받아 업데이트
[cloudshell-user@ip-10-0-146-72 ~]$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart
repository
Update Complete. ⎈Happy Helming!⎈
3.사전 구성된 스토리지클래스 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 35m
Prometheus 배포
(https://awskrug.github.io/eks-workshop/monitoring/deploy-prometheus/)
1.헬름을 통해서 EKS에 프로메테우스 배포
[cloudshell-user@ip-10-0-146-72 ~]$ helm install prometheus
prometheus-community/prometheus 
--set server.service.type="LoadBalancer" 
--namespace=monitoring 
--create-namespace
NAME: prometheus
LAST DEPLOYED: Fri Jul 15 05:36:50 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS
name from within your cluster:
prometheus-server.monitoring.svc.cluster.local
Get the Prometheus server URL by running these commands in the same
shell:
NOTE: It may take a few minutes for the LoadBalancer IP to be
available.
You can watch the status of by running 'kubectl get svc
--namespace monitoring -w prometheus-server'
export SERVICE_IP=$(kubectl get svc --namespace monitoring
prometheus-server -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:80
The Prometheus alertmanager can be accessed via port 80 on the following
DNS name from within your cluster:
prometheus-alertmanager.monitoring.svc.cluster.local
Get the Alertmanager URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus,component=alertmanager" -o
jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9093
########################################################################
## WARNING: Pod Security Policy has been moved to a global property. ##
## use .Values.podSecurityPolicy.enabled with pod-based ##
## annotations ##
# (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #
########################################################################
The Prometheus PushGateway can be accessed via port 9091 on the
following DNS name from within your cluster:
prometheus-pushgateway.monitoring.svc.cluster.local
Get the PushGateway URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus,component=pushgateway" -o
jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9091
For more information on running Prometheus, visit:
https://prometheus.io/
❗만약 storageclass를 gp2가 아닌 EFS 또는 gp3로 쓰고 싶다면 다음의 참조하세요
helm install prometheus prometheus-community/prometheus 
--set alertmanager.persistentVolume.storageClass="gp2" 
--set server.persistentVolume.storageClass="gp2" 
--set server.service.type="LoadBalancer" 
--namespace=monitoring 
--create-namespace
2.배포된 pods와 services 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-alertmanager-5c57cc6945-cqt2b 2/2 Running 0 5m40s
pod/prometheus-kube-state-metrics-77ddf69b4-68jg4 1/1 Running 0 5m40s
pod/prometheus-node-exporter-skndj 1/1 Running 0 5m40s
pod/prometheus-node-exporter-xw5fc 1/1 Running 0 5m40s
pod/prometheus-pushgateway-ff89cc976-bzxpv 1/1 Running 0 5m40s
pod/prometheus-server-6c99667b9b-6d958 2/2 Running 0 5m40s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/prometheus-alertmanager ClusterIP 10.100.161.159 <none>
80/TCP 5m40s
service/prometheus-kube-state-metrics ClusterIP 10.100.103.158 <none>
8080/TCP 5m40s
service/prometheus-node-exporter ClusterIP 10.100.53.36 <none>
9100/TCP 5m40s
service/prometheus-pushgateway ClusterIP 10.100.184.91 <none>
9091/TCP 5m40s
service/prometheus-server LoadBalancer 10.100.93.3
adc0a2cdfbf974ee6994d148f9efbce4-1289964317.ap-northeast-1.elb.amazonaws.com
80:31635/TCP 5m40s
3.배포된 프로메테우스 확인
4.조회된 메트릭 데이터 확인
5.배포된 프로메테우스 조회 및 삭제
[cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
prometheus monitoring 1 2022-07-15 05:36:50.109858421 +0000 UTC deployed prometheus-15.10.4 2.36.2
[cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall prometheus -n monitoring
release "prometheus" uninstalled
6.삭제된 프로메테우스 리소스 확인
[cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring
No resources found in monitoring namespace.
Prometheus stack 배포
1.헬름을 통해서 EKS에 프로메테우스 스택 배포
(https://kong.awsworkshop.io/eks-enterprise-setup/observability/prometheus.html)
[cloudshell-user@ip-10-0-146-72 ~]$ helm install kube-prometheus-stack
-n prometheus prometheus-community/kube-prometheus-stack 
--set prometheus.service.type=LoadBalancer 
--set grafana.service.type=LoadBalancer 
--namespace=monitoring 
--create-namespace
NAME: kube-prometheus-stack
LAST DEPLOYED: Fri Jul 15 05:01:05 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
2.배포된 pods와 services 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 3m56s
pod/kube-prometheus-stack-grafana-7dffb5648b-tmmjg 3/3 Running 0 4m6s
pod/kube-prometheus-stack-kube-state-metrics-668cff654f-cs6qb 1/1 Running 0 4m6s
pod/kube-prometheus-stack-operator-55d8668b46-c7q8g 1/1 Running 0 4m6s
pod/kube-prometheus-stack-prometheus-node-exporter-5mqr4 1/1 Running 0 4m6s
pod/kube-prometheus-stack-prometheus-node-exporter-zgw98 1/1 Running 0 4m6s
pod/prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 3m56s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/alertmanager-operated ClusterIP None <none>
9093/TCP,9094/TCP,9094/UDP 3m56s
service/kube-prometheus-stack-alertmanager ClusterIP 10.100.31.138 <none>
9093/TCP 4m6s
service/kube-prometheus-stack-grafana LoadBalancer 10.100.8.4
a0351049fd68c4164a4198923a18ddb9-867621865.ap-northeast-1.elb.amazonaws.com 80:30337/TCP
4m6s
service/kube-prometheus-stack-kube-state-metrics ClusterIP 10.100.115.224 <none>
8080/TCP 4m6s
service/kube-prometheus-stack-operator ClusterIP 10.100.154.42 <none>
443/TCP 4m6s
service/kube-prometheus-stack-prometheus LoadBalancer 10.100.52.126
a315d03620fb94d89849c9ea36e12b3c-707046690.ap-northeast-1.elb.amazonaws.com 9090:31885/TCP
4m6s
service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.100.109.235 <none>
9100/TCP 4m6s
service/prometheus-operated ClusterIP None <none>
9090/TCP 3m56s
❗현재 프로메테우스 스택의 큰 문제점 ?
프로메테우스 배포에는 다음과 같이 default로 storageclass(gp2)를 통해서 pv와 pvc가
생성됩니다.
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv -n monitoring
NAME CAPACITY ACCESS MODES RECLAIM POLICY
STATUS CLAIM STORAGECLASS REASON AGE
pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi RWO Delete
Bound monitoring/prometheus-alertmanager gp2 4m7s
pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi RWO Delete
Bound monitoring/prometheus-server gp2 4m7s
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY
ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Bound pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi
RWO gp2 4m22s
prometheus-server Bound pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi
RWO gp2 4m22s
그러나 프로메테우스 스택에서 storageclass를 지정해 주지 않으면 다음과 같이 pv,pvc를
이용하는 것이 아니라 emptyDir를 이용해서 임시로만 사용하도록 배포 됩니다.
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv,pvc
No resources found
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po -n monitoring
prometheus-kube-prometheus-stack-prometheus-0 -o yaml | grep volumes
-A30
volumes:
- name: config
secret:
defaultMode: 420
secretName: prometheus-kube-prometheus-stack-prometheus
- name: tls-assets
projected:
defaultMode: 420
sources:
- secret:
name: prometheus-kube-prometheus-stack-prometheus-tls-assets-0
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
<snipped>
따라서 현업 관점에서는 storageclass가 사용되도록 설정을 해줘야 하며, 이는
value.yaml을 통해서 추가 설정 배포 되어야 합니다. (또는 차트를 fork하고 새로 고쳐야함)
이는 다음의 링크를 참조하시기 바랍니다.
프로메테우스: https://github.com/prometheus-community/helm-charts/issues/186
그라파나: https://github.com/prometheus-community/helm-charts/issues/436
헬름value관련:
https://helm.sh/docs/intro/using_helm/#customizing-the-chart-before-installing
만약 정말하고 싶다면….부록1을 참고하세요
3.배포된 프로메테우스 확인
❗scapeInterval 시간을 배포 후에 변경하기를 원한다면
$ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap
56 scrapeInterval: 30s
$ kubectl edit prometheus -n monitoring
prometheus.monitoring.coreos.com/kube-prometheus-stack-prometheus edited
$ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap
56 scrapeInterval: 2m
4.배포된 그라파나 확인 및 로그인
ID: admin
Password: prom-operator
5.미리 설정된 데이터 소스가 프로메테우스인지 확인
6. 미리 만들어진 대시보드를 불러오기 위해 13770을 import 메뉴에서
입력
7.Data Source를 프로메테우스로 선택하고 import 누름
8.import 된 13770을 감상
9.(필요시) 배포된 프로메테우스 스택 조회 및 삭제
[cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
kube-prometheus-stack monitoring 1 2022-07-15 05:01:05.881146977 +0000 UTC deployed kube-prometheus-stack-37.2.0 0.57.0
[cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall -n monitoring
kube-prometheus-stack
release "kube-prometheus-stack" uninstalled
부록1
1.helm inspect로 values 파일 생성
$ helm inspect values prometheus-community/kube-prometheus-stack
--version 38.0.2 > kube-prometheus-stack-38.0.2.values
2. 생성된 values 파일에 필요 내용 추가 및 수정
라인 번호는 수정 순서에 따라 다소 차이가 있을 수도 있습니다.
참고로 라인 번호는 vi 실행 이후에 :set nu로 표시할 수 있습니다.
수정
542 ## Storage is the definition of how storage will be used by the
Alertmanager instances.
543 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
544 ##
545 storage:
546 volumeClaimTemplate:
547 spec:
548 storageClassName: gp2
549 accessModes: ["ReadWriteOnce"]
550 resources:
551 requests:
552 storage: 50Gi
553 # selector: {}
추가
697 ## Using default values from
https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.y
aml
698 ##
699 grafana:
700 enabled: true
701 namespaceOverride: ""
702
703 # override configuration by hoon
704 persistence:
705 enabled: true
706 type: pvc
707 storageClassName: gp2
708 accessModes:
709 - ReadWriteOnce
710 size: 100Gi
711 finalizers:
712 - kubernetes.io/pvc-protection
수정
726 ## Timezone for the default dashboards
727 ## Other options are: browser or a specific timezone, i.e.
Europe/Luxembourg
728 ##
729 defaultDashboardsTimezone: utc
730
731 adminPassword: admin
732
수정
2580 ## Prometheus StorageSpec for persistent data
2581 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
2582 ##
2583 storageSpec:
2584 ## Using PersistentVolumeClaim
2585 ##
2586 volumeClaimTemplate:
2587 spec:
2588 storageClassName: gp2
2589 accessModes: ["ReadWriteOnce"]
2590 resources:
2591 requests:
2592 storage: 50Gi
2593 # selector: {}
3.helm install 실행
[cloudshell-user@ip-10-0-6-163 ~]$ helm install
prometheus-community/kube-prometheus-stack
--set prometheus.service.type=LoadBalancer 
--set grafana.service.type=LoadBalancer 
--create-namespace 
--namespace monitoring 
--generate-name 
--values kube-prometheus-stack-38.0.2.values
NAME: kube-prometheus-stack-1658960026
LAST DEPLOYED: Wed Jul 27 22:13:48 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack-1658960026"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
4.변경된 값을 가지고 있는 values를 통해서 생성된 프로메테우스 스택
확인
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po,svc,pv,pvc -n
monitoring
NAME
READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-1658-alertmanager-0
2/2 Running 0 93s
pod/kube-prometheus-stack-1658-operator-67699f6d8-429rd
1/1 Running 0 94s
pod/kube-prometheus-stack-1658961024-grafana-7d98b7d99f-65qjj
3/3 Running 0 94s
pod/kube-prometheus-stack-1658961024-kube-state-metrics-65f588z8msj
1/1 Running 0 94s
pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-5zlcd
1/1 Running 0 95s
pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-wt6kf
1/1 Running 0 94s
pod/prometheus-kube-prometheus-stack-1658-prometheus-0
2/2 Running 0 92s
NAME TYPE
CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/alertmanager-operated
ClusterIP None <none>
9093/TCP,9094/TCP,9094/UDP 93s
service/kube-prometheus-stack-1658-alertmanager
ClusterIP 10.100.254.128 <none>
9093/TCP 95s
service/kube-prometheus-stack-1658-operator
ClusterIP 10.100.253.198 <none>
443/TCP 95s
service/kube-prometheus-stack-1658-prometheus
LoadBalancer 10.100.209.143
afc7a705d6f094bf0bc142586bb70789-901394507.ap-northeast-1.elb.amazonaws.
com 9090:31388/TCP 95s
service/kube-prometheus-stack-1658961024-grafana
LoadBalancer 10.100.102.193
ad8ec0af13eb84ae780236b179981e29-967158055.ap-northeast-1.elb.amazonaws.
com 80:31050/TCP 95s
service/kube-prometheus-stack-1658961024-kube-state-metrics
ClusterIP 10.100.160.216 <none>
8080/TCP 95s
service/kube-prometheus-stack-1658961024-prometheus-node-exporter
ClusterIP 10.100.43.187 <none>
9100/TCP 95s
service/prometheus-operated
ClusterIP None <none>
9090/TCP 92s
NAME CAPACITY
ACCESS MODES RECLAIM POLICY STATUS CLAIM
STORAGECLASS REASON AGE
persistentvolume/pvc-59aa28bd-b9cf-4dc9-93b0-44674f8578bb 100Gi
RWO Delete Bound
monitoring/kube-prometheus-stack-1658961024-grafana
gp2 89s
persistentvolume/pvc-dac51bfd-ae87-47cf-891b-7f98a4f142a0 50Gi
RWO Delete Bound
monitoring/alertmanager-kube-prometheus-stack-1658-alertmanager-db-alert
manager-kube-prometheus-stack-1658-alertmanager-0 gp2
56m
persistentvolume/pvc-e1961736-9cbc-4ee8-9d6e-dcbd932afc6f 50Gi
RWO Delete Bound
monitoring/prometheus-kube-prometheus-stack-1658-prometheus-db-prometheu
s-kube-prometheus-stack-1658-prometheus-0 gp2
56m
NAME
STATUS VOLUME CAPACITY ACCESS
MODES STORAGECLASS AGE
persistentvolumeclaim/alertmanager-kube-prometheus-stack-1658-alertmanag
er-db-alertmanager-kube-prometheus-stack-1658-alertmanager-0 Bound
pvc-dac51bfd-ae87-47cf-891b-7f98a4f142a0 50Gi RWO gp2
56m
persistentvolumeclaim/kube-prometheus-stack-1658961024-grafana
Bound pvc-59aa28bd-b9cf-4dc9-93b0-44674f8578bb 100Gi RWO
gp2 95s
persistentvolumeclaim/prometheus-kube-prometheus-stack-1658-prometheus-d
b-prometheus-kube-prometheus-stack-1658-prometheus-0 Bound
pvc-e1961736-9cbc-4ee8-9d6e-dcbd932afc6f 50Gi RWO gp2
56m
레퍼런스:
https://1week.tistory.com/43
https://passwd.tistory.com/entry/Helm-kube-prometheus-stack-Grafana-Persistence-%ED%9
9%9C%EC%84%B1%ED%99%94
https://github.com/prometheus-community/helm-charts/issues/113

More Related Content

Prometheus on EKS

  • 1. Prometheus on EKS 가이드 문서 (https://docs.aws.amazon.com/ko_kr/eks/latest/userguide/prometheus.html) 📌QA test Region on (ap-northeast-1 / 도쿄) https://github.com/sysnet4admin
  • 2. Helm v3.9.1 설치 1.openssl 설치 [cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl -y Loaded plugins: ovl, priorities Resolving Dependencies --> Running transaction check ---> Package openssl.x86_64 1:1.0.2k-24.amzn2.0.3 will be installed --> Finished Dependency Resolution <snipped> Downloading packages: openssl-1.0.2k-24.amzn2.0.3.x86_64.rpm <snipped> Installed: openssl.x86_64 1:1.0.2k-24.amzn2.0.3 Complete! 2.helm binary 설치 [cloudshell-user@ip-10-0-146-72 ~]$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 [cloudshell-user@ip-10-0-146-72 ~]$ chmod 700 get_helm.sh [cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh Downloading https://get.helm.sh/helm-v3.9.1-linux-amd64.tar.gz Verifying checksum... Done. Preparing to install helm into /usr/local/bin helm installed into /usr/local/bin/helm 3.helm을 실행 디렉토리 들로 옮김 [cloudshell-user@ip-10-0-46-136 ~]$ cp /usr/local/bin/helm $HOME/bin/helm && export PATH=$PATH:$HOME/bin 4.설치된 helm 확인 [cloudshell-user@ip-10-0-146-72 ~]$ helm version version.BuildInfo{Version:"v3.9.1", GitCommit:"a7c043acb5ff905c261cfdc923a35776ba5e66e4", GitTreeState:"clean", GoVersion:"go1.17.5"}
  • 3. ❗만약 openssl이 설치되어 있지 않은 경우 [cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh In order to verify checksum, openssl must first be installed. Please install openssl or set VERIFY_CHECKSUM=false in your environment. Failed to install helm For support, go to https://github.com/helm/helm. [cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl
  • 4. 헬름을 통한 Prometheus 배포를 위한 사전 작업 1.프로메테우스 설치를 위한 헬름 레포를 추가 [cloudshell-user@ip-10-0-146-72 ~]$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts "prometheus-community" has been added to your repositories 2.레포에서 최신 내용을 받아 업데이트 [cloudshell-user@ip-10-0-146-72 ~]$ helm repo update Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "prometheus-community" chart repository Update Complete. ⎈Happy Helming!⎈ 3.사전 구성된 스토리지클래스 확인 [cloudshell-user@ip-10-0-146-72 ~]$ kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 35m
  • 5. Prometheus 배포 (https://awskrug.github.io/eks-workshop/monitoring/deploy-prometheus/) 1.헬름을 통해서 EKS에 프로메테우스 배포 [cloudshell-user@ip-10-0-146-72 ~]$ helm install prometheus prometheus-community/prometheus --set server.service.type="LoadBalancer" --namespace=monitoring --create-namespace NAME: prometheus LAST DEPLOYED: Fri Jul 15 05:36:50 2022 NAMESPACE: monitoring STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster: prometheus-server.monitoring.svc.cluster.local Get the Prometheus server URL by running these commands in the same shell: NOTE: It may take a few minutes for the LoadBalancer IP to be available. You can watch the status of by running 'kubectl get svc --namespace monitoring -w prometheus-server' export SERVICE_IP=$(kubectl get svc --namespace monitoring prometheus-server -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo http://$SERVICE_IP:80 The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster: prometheus-alertmanager.monitoring.svc.cluster.local Get the Alertmanager URL by running these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace monitoring -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace monitoring port-forward $POD_NAME 9093
  • 6. ######################################################################## ## WARNING: Pod Security Policy has been moved to a global property. ## ## use .Values.podSecurityPolicy.enabled with pod-based ## ## annotations ## # (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) # ######################################################################## The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster: prometheus-pushgateway.monitoring.svc.cluster.local Get the PushGateway URL by running these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace monitoring -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace monitoring port-forward $POD_NAME 9091 For more information on running Prometheus, visit: https://prometheus.io/ ❗만약 storageclass를 gp2가 아닌 EFS 또는 gp3로 쓰고 싶다면 다음의 참조하세요 helm install prometheus prometheus-community/prometheus --set alertmanager.persistentVolume.storageClass="gp2" --set server.persistentVolume.storageClass="gp2" --set server.service.type="LoadBalancer" --namespace=monitoring --create-namespace 2.배포된 pods와 services 확인 [cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring NAME READY STATUS RESTARTS AGE pod/prometheus-alertmanager-5c57cc6945-cqt2b 2/2 Running 0 5m40s pod/prometheus-kube-state-metrics-77ddf69b4-68jg4 1/1 Running 0 5m40s pod/prometheus-node-exporter-skndj 1/1 Running 0 5m40s pod/prometheus-node-exporter-xw5fc 1/1 Running 0 5m40s pod/prometheus-pushgateway-ff89cc976-bzxpv 1/1 Running 0 5m40s pod/prometheus-server-6c99667b9b-6d958 2/2 Running 0 5m40s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus-alertmanager ClusterIP 10.100.161.159 <none> 80/TCP 5m40s service/prometheus-kube-state-metrics ClusterIP 10.100.103.158 <none>
  • 7. 8080/TCP 5m40s service/prometheus-node-exporter ClusterIP 10.100.53.36 <none> 9100/TCP 5m40s service/prometheus-pushgateway ClusterIP 10.100.184.91 <none> 9091/TCP 5m40s service/prometheus-server LoadBalancer 10.100.93.3 adc0a2cdfbf974ee6994d148f9efbce4-1289964317.ap-northeast-1.elb.amazonaws.com 80:31635/TCP 5m40s 3.배포된 프로메테우스 확인
  • 8. 4.조회된 메트릭 데이터 확인 5.배포된 프로메테우스 조회 및 삭제 [cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION prometheus monitoring 1 2022-07-15 05:36:50.109858421 +0000 UTC deployed prometheus-15.10.4 2.36.2 [cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall prometheus -n monitoring release "prometheus" uninstalled 6.삭제된 프로메테우스 리소스 확인 [cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION [cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring No resources found in monitoring namespace.
  • 9. Prometheus stack 배포 1.헬름을 통해서 EKS에 프로메테우스 스택 배포 (https://kong.awsworkshop.io/eks-enterprise-setup/observability/prometheus.html) [cloudshell-user@ip-10-0-146-72 ~]$ helm install kube-prometheus-stack -n prometheus prometheus-community/kube-prometheus-stack --set prometheus.service.type=LoadBalancer --set grafana.service.type=LoadBalancer --namespace=monitoring --create-namespace NAME: kube-prometheus-stack LAST DEPLOYED: Fri Jul 15 05:01:05 2022 NAMESPACE: monitoring STATUS: deployed REVISION: 1 NOTES: kube-prometheus-stack has been installed. Check its status by running: kubectl --namespace monitoring get pods -l "release=kube-prometheus-stack" Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. 2.배포된 pods와 services 확인 [cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring NAME READY STATUS RESTARTS AGE pod/alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 3m56s pod/kube-prometheus-stack-grafana-7dffb5648b-tmmjg 3/3 Running 0 4m6s pod/kube-prometheus-stack-kube-state-metrics-668cff654f-cs6qb 1/1 Running 0 4m6s pod/kube-prometheus-stack-operator-55d8668b46-c7q8g 1/1 Running 0 4m6s pod/kube-prometheus-stack-prometheus-node-exporter-5mqr4 1/1 Running 0 4m6s pod/kube-prometheus-stack-prometheus-node-exporter-zgw98 1/1 Running 0 4m6s pod/prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 3m56s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3m56s service/kube-prometheus-stack-alertmanager ClusterIP 10.100.31.138 <none> 9093/TCP 4m6s service/kube-prometheus-stack-grafana LoadBalancer 10.100.8.4 a0351049fd68c4164a4198923a18ddb9-867621865.ap-northeast-1.elb.amazonaws.com 80:30337/TCP 4m6s
  • 10. service/kube-prometheus-stack-kube-state-metrics ClusterIP 10.100.115.224 <none> 8080/TCP 4m6s service/kube-prometheus-stack-operator ClusterIP 10.100.154.42 <none> 443/TCP 4m6s service/kube-prometheus-stack-prometheus LoadBalancer 10.100.52.126 a315d03620fb94d89849c9ea36e12b3c-707046690.ap-northeast-1.elb.amazonaws.com 9090:31885/TCP 4m6s service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.100.109.235 <none> 9100/TCP 4m6s service/prometheus-operated ClusterIP None <none> 9090/TCP 3m56s ❗현재 프로메테우스 스택의 큰 문제점 ? 프로메테우스 배포에는 다음과 같이 default로 storageclass(gp2)를 통해서 pv와 pvc가 생성됩니다. [cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv -n monitoring NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi RWO Delete Bound monitoring/prometheus-alertmanager gp2 4m7s pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi RWO Delete Bound monitoring/prometheus-server gp2 4m7s [cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pvc -n monitoring NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE prometheus-alertmanager Bound pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi RWO gp2 4m22s prometheus-server Bound pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi RWO gp2 4m22s 그러나 프로메테우스 스택에서 storageclass를 지정해 주지 않으면 다음과 같이 pv,pvc를 이용하는 것이 아니라 emptyDir를 이용해서 임시로만 사용하도록 배포 됩니다. [cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv,pvc No resources found [cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -o yaml | grep volumes -A30 volumes: - name: config secret: defaultMode: 420
  • 11. secretName: prometheus-kube-prometheus-stack-prometheus - name: tls-assets projected: defaultMode: 420 sources: - secret: name: prometheus-kube-prometheus-stack-prometheus-tls-assets-0 - emptyDir: {} name: config-out - configMap: defaultMode: 420 name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0 name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0 <snipped> 따라서 현업 관점에서는 storageclass가 사용되도록 설정을 해줘야 하며, 이는 value.yaml을 통해서 추가 설정 배포 되어야 합니다. (또는 차트를 fork하고 새로 고쳐야함) 이는 다음의 링크를 참조하시기 바랍니다. 프로메테우스: https://github.com/prometheus-community/helm-charts/issues/186 그라파나: https://github.com/prometheus-community/helm-charts/issues/436 헬름value관련: https://helm.sh/docs/intro/using_helm/#customizing-the-chart-before-installing 만약 정말하고 싶다면….부록1을 참고하세요 3.배포된 프로메테우스 확인
  • 12. ❗scapeInterval 시간을 배포 후에 변경하기를 원한다면 $ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap 56 scrapeInterval: 30s $ kubectl edit prometheus -n monitoring prometheus.monitoring.coreos.com/kube-prometheus-stack-prometheus edited $ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap 56 scrapeInterval: 2m
  • 13. 4.배포된 그라파나 확인 및 로그인 ID: admin Password: prom-operator 5.미리 설정된 데이터 소스가 프로메테우스인지 확인
  • 14. 6. 미리 만들어진 대시보드를 불러오기 위해 13770을 import 메뉴에서 입력 7.Data Source를 프로메테우스로 선택하고 import 누름
  • 15. 8.import 된 13770을 감상 9.(필요시) 배포된 프로메테우스 스택 조회 및 삭제 [cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION kube-prometheus-stack monitoring 1 2022-07-15 05:01:05.881146977 +0000 UTC deployed kube-prometheus-stack-37.2.0 0.57.0 [cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall -n monitoring kube-prometheus-stack release "kube-prometheus-stack" uninstalled
  • 16. 부록1 1.helm inspect로 values 파일 생성 $ helm inspect values prometheus-community/kube-prometheus-stack --version 38.0.2 > kube-prometheus-stack-38.0.2.values 2. 생성된 values 파일에 필요 내용 추가 및 수정 라인 번호는 수정 순서에 따라 다소 차이가 있을 수도 있습니다. 참고로 라인 번호는 vi 실행 이후에 :set nu로 표시할 수 있습니다. 수정 542 ## Storage is the definition of how storage will be used by the Alertmanager instances. 543 ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc umentation/user-guides/storage.md 544 ## 545 storage: 546 volumeClaimTemplate: 547 spec: 548 storageClassName: gp2 549 accessModes: ["ReadWriteOnce"] 550 resources: 551 requests: 552 storage: 50Gi 553 # selector: {} 추가 697 ## Using default values from https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.y aml 698 ## 699 grafana: 700 enabled: true 701 namespaceOverride: "" 702 703 # override configuration by hoon 704 persistence: 705 enabled: true 706 type: pvc
  • 17. 707 storageClassName: gp2 708 accessModes: 709 - ReadWriteOnce 710 size: 100Gi 711 finalizers: 712 - kubernetes.io/pvc-protection 수정 726 ## Timezone for the default dashboards 727 ## Other options are: browser or a specific timezone, i.e. Europe/Luxembourg 728 ## 729 defaultDashboardsTimezone: utc 730 731 adminPassword: admin 732 수정 2580 ## Prometheus StorageSpec for persistent data 2581 ## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc umentation/user-guides/storage.md 2582 ## 2583 storageSpec: 2584 ## Using PersistentVolumeClaim 2585 ## 2586 volumeClaimTemplate: 2587 spec: 2588 storageClassName: gp2 2589 accessModes: ["ReadWriteOnce"] 2590 resources: 2591 requests: 2592 storage: 50Gi 2593 # selector: {} 3.helm install 실행 [cloudshell-user@ip-10-0-6-163 ~]$ helm install prometheus-community/kube-prometheus-stack
  • 18. --set prometheus.service.type=LoadBalancer --set grafana.service.type=LoadBalancer --create-namespace --namespace monitoring --generate-name --values kube-prometheus-stack-38.0.2.values NAME: kube-prometheus-stack-1658960026 LAST DEPLOYED: Wed Jul 27 22:13:48 2022 NAMESPACE: monitoring STATUS: deployed REVISION: 1 NOTES: kube-prometheus-stack has been installed. Check its status by running: kubectl --namespace monitoring get pods -l "release=kube-prometheus-stack-1658960026" Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator. 4.변경된 값을 가지고 있는 values를 통해서 생성된 프로메테우스 스택 확인 [cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po,svc,pv,pvc -n monitoring NAME READY STATUS RESTARTS AGE pod/alertmanager-kube-prometheus-stack-1658-alertmanager-0 2/2 Running 0 93s pod/kube-prometheus-stack-1658-operator-67699f6d8-429rd 1/1 Running 0 94s pod/kube-prometheus-stack-1658961024-grafana-7d98b7d99f-65qjj 3/3 Running 0 94s pod/kube-prometheus-stack-1658961024-kube-state-metrics-65f588z8msj 1/1 Running 0 94s pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-5zlcd 1/1 Running 0 95s pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-wt6kf 1/1 Running 0 94s pod/prometheus-kube-prometheus-stack-1658-prometheus-0 2/2 Running 0 92s NAME TYPE CLUSTER-IP EXTERNAL-IP
  • 19. PORT(S) AGE service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 93s service/kube-prometheus-stack-1658-alertmanager ClusterIP 10.100.254.128 <none> 9093/TCP 95s service/kube-prometheus-stack-1658-operator ClusterIP 10.100.253.198 <none> 443/TCP 95s service/kube-prometheus-stack-1658-prometheus LoadBalancer 10.100.209.143 afc7a705d6f094bf0bc142586bb70789-901394507.ap-northeast-1.elb.amazonaws. com 9090:31388/TCP 95s service/kube-prometheus-stack-1658961024-grafana LoadBalancer 10.100.102.193 ad8ec0af13eb84ae780236b179981e29-967158055.ap-northeast-1.elb.amazonaws. com 80:31050/TCP 95s service/kube-prometheus-stack-1658961024-kube-state-metrics ClusterIP 10.100.160.216 <none> 8080/TCP 95s service/kube-prometheus-stack-1658961024-prometheus-node-exporter ClusterIP 10.100.43.187 <none> 9100/TCP 95s service/prometheus-operated ClusterIP None <none> 9090/TCP 92s NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-59aa28bd-b9cf-4dc9-93b0-44674f8578bb 100Gi RWO Delete Bound monitoring/kube-prometheus-stack-1658961024-grafana gp2 89s persistentvolume/pvc-dac51bfd-ae87-47cf-891b-7f98a4f142a0 50Gi RWO Delete Bound monitoring/alertmanager-kube-prometheus-stack-1658-alertmanager-db-alert manager-kube-prometheus-stack-1658-alertmanager-0 gp2 56m persistentvolume/pvc-e1961736-9cbc-4ee8-9d6e-dcbd932afc6f 50Gi RWO Delete Bound monitoring/prometheus-kube-prometheus-stack-1658-prometheus-db-prometheu s-kube-prometheus-stack-1658-prometheus-0 gp2 56m
  • 20. NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/alertmanager-kube-prometheus-stack-1658-alertmanag er-db-alertmanager-kube-prometheus-stack-1658-alertmanager-0 Bound pvc-dac51bfd-ae87-47cf-891b-7f98a4f142a0 50Gi RWO gp2 56m persistentvolumeclaim/kube-prometheus-stack-1658961024-grafana Bound pvc-59aa28bd-b9cf-4dc9-93b0-44674f8578bb 100Gi RWO gp2 95s persistentvolumeclaim/prometheus-kube-prometheus-stack-1658-prometheus-d b-prometheus-kube-prometheus-stack-1658-prometheus-0 Bound pvc-e1961736-9cbc-4ee8-9d6e-dcbd932afc6f 50Gi RWO gp2 56m 레퍼런스: https://1week.tistory.com/43 https://passwd.tistory.com/entry/Helm-kube-prometheus-stack-Grafana-Persistence-%ED%9 9%9C%EC%84%B1%ED%99%94 https://github.com/prometheus-community/helm-charts/issues/113