Prometheus on EKS

Prometheus on EKS 가이드 문서
(https://docs.aws.amazon.com/ko_kr/eks/latest/userguide/prometheus.html)
📌QA test Region on (ap-northeast-1 / 도쿄)
https://github.com/sysnet4admin

Helm v3.9.1 설치
1.openssl 설치
[cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl -y
Loaded plugins: ovl, priorities
Resolving Dependencies
--> Running transaction check
---> Package openssl.x86_64 1:1.0.2k-24.amzn2.0.3 will be installed
--> Finished Dependency Resolution
<snipped>
Downloading packages:
openssl-1.0.2k-24.amzn2.0.3.x86_64.rpm
<snipped>
Installed:
openssl.x86_64 1:1.0.2k-24.amzn2.0.3
Complete!
2.helm binary 설치
[cloudshell-user@ip-10-0-146-72 ~]$ curl -fsSL -o get_helm.sh
https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
[cloudshell-user@ip-10-0-146-72 ~]$ chmod 700 get_helm.sh
[cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.9.1-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
3.helm을 실행 디렉토리 들로 옮김
[cloudshell-user@ip-10-0-46-136 ~]$ cp /usr/local/bin/helm
$HOME/bin/helm && export PATH=$PATH:$HOME/bin
4.설치된 helm 확인
[cloudshell-user@ip-10-0-146-72 ~]$ helm version
version.BuildInfo{Version:"v3.9.1",
GitCommit:"a7c043acb5ff905c261cfdc923a35776ba5e66e4",
GitTreeState:"clean", GoVersion:"go1.17.5"}

❗만약 openssl이 설치되어 있지 않은 경우
[cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh
In order to verify checksum, openssl must first be installed.
Please install openssl or set VERIFY_CHECKSUM=false in your environment.
Failed to install helm
For support, go to https://github.com/helm/helm.
[cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl

헬름을 통한 Prometheus 배포를 위한 사전 작업
1.프로메테우스 설치를 위한 헬름 레포를 추가
[cloudshell-user@ip-10-0-146-72 ~]$ helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories
2.레포에서 최신 내용을 받아 업데이트
[cloudshell-user@ip-10-0-146-72 ~]$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart
repository
Update Complete. ⎈Happy Helming!⎈
3.사전 구성된 스토리지클래스 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 35m

Prometheus 배포
(https://awskrug.github.io/eks-workshop/monitoring/deploy-prometheus/)
1.헬름을 통해서 EKS에 프로메테우스 배포
[cloudshell-user@ip-10-0-146-72 ~]$ helm install prometheus
prometheus-community/prometheus
--set server.service.type="LoadBalancer"
--namespace=monitoring
--create-namespace
NAME: prometheus
LAST DEPLOYED: Fri Jul 15 05:36:50 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS
name from within your cluster:
prometheus-server.monitoring.svc.cluster.local
Get the Prometheus server URL by running these commands in the same
shell:
NOTE: It may take a few minutes for the LoadBalancer IP to be
available.
You can watch the status of by running 'kubectl get svc
--namespace monitoring -w prometheus-server'
export SERVICE_IP=$(kubectl get svc --namespace monitoring
prometheus-server -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:80
The Prometheus alertmanager can be accessed via port 80 on the following
DNS name from within your cluster:
prometheus-alertmanager.monitoring.svc.cluster.local
Get the Alertmanager URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus,component=alertmanager" -o
jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9093

########################################################################
## WARNING: Pod Security Policy has been moved to a global property. ##
## use .Values.podSecurityPolicy.enabled with pod-based ##
## annotations ##
# (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #
########################################################################
The Prometheus PushGateway can be accessed via port 9091 on the
following DNS name from within your cluster:
prometheus-pushgateway.monitoring.svc.cluster.local
Get the PushGateway URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus,component=pushgateway" -o
jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9091
For more information on running Prometheus, visit:
https://prometheus.io/
❗만약 storageclass를 gp2가 아닌 EFS 또는 gp3로 쓰고 싶다면 다음의 참조하세요
helm install prometheus prometheus-community/prometheus
--set alertmanager.persistentVolume.storageClass="gp2"
--set server.persistentVolume.storageClass="gp2"
--set server.service.type="LoadBalancer"
--create-namespace
2.배포된 pods와 services 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-alertmanager-5c57cc6945-cqt2b 2/2 Running 0 5m40s
pod/prometheus-kube-state-metrics-77ddf69b4-68jg4 1/1 Running 0 5m40s
pod/prometheus-node-exporter-skndj 1/1 Running 0 5m40s
pod/prometheus-node-exporter-xw5fc 1/1 Running 0 5m40s
pod/prometheus-pushgateway-ff89cc976-bzxpv 1/1 Running 0 5m40s
pod/prometheus-server-6c99667b9b-6d958 2/2 Running 0 5m40s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/prometheus-alertmanager ClusterIP 10.100.161.159 <none>
80/TCP 5m40s
service/prometheus-kube-state-metrics ClusterIP 10.100.103.158 <none>

8080/TCP 5m40s
service/prometheus-node-exporter ClusterIP 10.100.53.36 <none>
9100/TCP 5m40s
service/prometheus-pushgateway ClusterIP 10.100.184.91 <none>
9091/TCP 5m40s
service/prometheus-server LoadBalancer 10.100.93.3
adc0a2cdfbf974ee6994d148f9efbce4-1289964317.ap-northeast-1.elb.amazonaws.com
80:31635/TCP 5m40s
3.배포된 프로메테우스 확인

4.조회된 메트릭 데이터 확인
5.배포된 프로메테우스 조회 및 삭제
[cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
prometheus monitoring 1 2022-07-15 05:36:50.109858421 +0000 UTC deployed prometheus-15.10.4 2.36.2
[cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall prometheus -n monitoring
release "prometheus" uninstalled
6.삭제된 프로메테우스 리소스 확인
No resources found in monitoring namespace.

Prometheus stack 배포
1.헬름을 통해서 EKS에 프로메테우스 스택 배포
(https://kong.awsworkshop.io/eks-enterprise-setup/observability/prometheus.html)
[cloudshell-user@ip-10-0-146-72 ~]$ helm install kube-prometheus-stack
-n prometheus prometheus-community/kube-prometheus-stack
--set prometheus.service.type=LoadBalancer
--set grafana.service.type=LoadBalancer
--create-namespace
NAME: kube-prometheus-stack
LAST DEPLOYED: Fri Jul 15 05:01:05 2022
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
2.배포된 pods와 services 확인
NAME READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 3m56s
pod/kube-prometheus-stack-grafana-7dffb5648b-tmmjg 3/3 Running 0 4m6s
pod/kube-prometheus-stack-kube-state-metrics-668cff654f-cs6qb 1/1 Running 0 4m6s
pod/kube-prometheus-stack-operator-55d8668b46-c7q8g 1/1 Running 0 4m6s
pod/kube-prometheus-stack-prometheus-node-exporter-5mqr4 1/1 Running 0 4m6s
pod/kube-prometheus-stack-prometheus-node-exporter-zgw98 1/1 Running 0 4m6s
pod/prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 3m56s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/alertmanager-operated ClusterIP None <none>
9093/TCP,9094/TCP,9094/UDP 3m56s
service/kube-prometheus-stack-alertmanager ClusterIP 10.100.31.138 <none>
9093/TCP 4m6s
service/kube-prometheus-stack-grafana LoadBalancer 10.100.8.4
a0351049fd68c4164a4198923a18ddb9-867621865.ap-northeast-1.elb.amazonaws.com 80:30337/TCP
4m6s

service/kube-prometheus-stack-kube-state-metrics ClusterIP 10.100.115.224 <none>
8080/TCP 4m6s
service/kube-prometheus-stack-operator ClusterIP 10.100.154.42 <none>
443/TCP 4m6s
service/kube-prometheus-stack-prometheus LoadBalancer 10.100.52.126
a315d03620fb94d89849c9ea36e12b3c-707046690.ap-northeast-1.elb.amazonaws.com 9090:31885/TCP
4m6s
service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.100.109.235 <none>
9100/TCP 4m6s
service/prometheus-operated ClusterIP None <none>
9090/TCP 3m56s
❗현재 프로메테우스 스택의 큰 문제점 ?
프로메테우스 배포에는 다음과 같이 default로 storageclass(gp2)를 통해서 pv와 pvc가
생성됩니다.
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv -n monitoring
NAME CAPACITY ACCESS MODES RECLAIM POLICY
STATUS CLAIM STORAGECLASS REASON AGE
pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi RWO Delete
Bound monitoring/prometheus-alertmanager gp2 4m7s
pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi RWO Delete
Bound monitoring/prometheus-server gp2 4m7s
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY
ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Bound pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi
RWO gp2 4m22s
prometheus-server Bound pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi
RWO gp2 4m22s
그러나 프로메테우스 스택에서 storageclass를 지정해 주지 않으면 다음과 같이 pv,pvc를
이용하는 것이 아니라 emptyDir를 이용해서 임시로만 사용하도록 배포 됩니다.
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv,pvc
No resources found
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po -n monitoring
prometheus-kube-prometheus-stack-prometheus-0 -o yaml | grep volumes
-A30
volumes:
- name: config
secret:
defaultMode: 420

secretName: prometheus-kube-prometheus-stack-prometheus
- name: tls-assets
projected:
defaultMode: 420
sources:
- secret:
name: prometheus-kube-prometheus-stack-prometheus-tls-assets-0
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
<snipped>
따라서 현업 관점에서는 storageclass가 사용되도록 설정을 해줘야 하며, 이는
value.yaml을 통해서 추가 설정 배포 되어야 합니다. (또는 차트를 fork하고 새로 고쳐야함)
이는 다음의 링크를 참조하시기 바랍니다.
프로메테우스: https://github.com/prometheus-community/helm-charts/issues/186
그라파나: https://github.com/prometheus-community/helm-charts/issues/436
헬름value관련:
https://helm.sh/docs/intro/using_helm/#customizing-the-chart-before-installing
만약 정말하고 싶다면….부록1을 참고하세요
3.배포된 프로메테우스 확인

❗scapeInterval 시간을 배포 후에 변경하기를 원한다면
$ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap
56 scrapeInterval: 30s
$ kubectl edit prometheus -n monitoring
prometheus.monitoring.coreos.com/kube-prometheus-stack-prometheus edited
$ kubectl get prometheus -n monitoring -o yaml | nl | grep scrap
56 scrapeInterval: 2m

4.배포된 그라파나 확인 및 로그인
ID: admin
Password: prom-operator
5.미리 설정된 데이터 소스가 프로메테우스인지 확인

6. 미리 만들어진 대시보드를 불러오기 위해 13770을 import 메뉴에서
입력
7.Data Source를 프로메테우스로 선택하고 import 누름

8.import 된 13770을 감상
9.(필요시) 배포된 프로메테우스 스택 조회 및 삭제
kube-prometheus-stack monitoring 1 2022-07-15 05:01:05.881146977 +0000 UTC deployed kube-prometheus-stack-37.2.0 0.57.0
[cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall -n monitoring
kube-prometheus-stack
release "kube-prometheus-stack" uninstalled

부록1
1.helm inspect로 values 파일 생성
$ helm inspect values prometheus-community/kube-prometheus-stack
--version 38.0.2 > kube-prometheus-stack-38.0.2.values
2. 생성된 values 파일에 필요 내용 추가 및 수정
라인 번호는 수정 순서에 따라 다소 차이가 있을 수도 있습니다.
참고로 라인 번호는 vi 실행 이후에 :set nu로 표시할 수 있습니다.
수정
542 ## Storage is the definition of how storage will be used by the
Alertmanager instances.
543 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
544 ##
545 storage:
546 volumeClaimTemplate:
547 spec:
548 storageClassName: gp2
549 accessModes: ["ReadWriteOnce"]
550 resources:
551 requests:
552 storage: 50Gi
553 # selector: {}
추가
697 ## Using default values from
https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.y
aml
698 ##
699 grafana:
700 enabled: true
701 namespaceOverride: ""
702
703 # override configuration by hoon
704 persistence:
705 enabled: true
706 type: pvc

708 accessModes:
709 - ReadWriteOnce
710 size: 100Gi
711 finalizers:
712 - kubernetes.io/pvc-protection
수정
726 ## Timezone for the default dashboards
727 ## Other options are: browser or a specific timezone, i.e.
Europe/Luxembourg
728 ##
729 defaultDashboardsTimezone: utc
730
731 adminPassword: admin
732
수정
2580 ## Prometheus StorageSpec for persistent data
2581 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
2582 ##
2583 storageSpec:
2584 ## Using PersistentVolumeClaim
2585 ##
2586 volumeClaimTemplate:
2587 spec:
2589 accessModes: ["ReadWriteOnce"]
2590 resources:
2591 requests:
2592 storage: 50Gi
2593 # selector: {}
3.helm install 실행
[cloudshell-user@ip-10-0-6-163 ~]$ helm install
prometheus-community/kube-prometheus-stack

--set prometheus.service.type=LoadBalancer
--set grafana.service.type=LoadBalancer
--create-namespace
--namespace monitoring
--generate-name
--values kube-prometheus-stack-38.0.2.values
NAME: kube-prometheus-stack-1658960026
LAST DEPLOYED: Wed Jul 27 22:13:48 2022
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack-1658960026"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
4.변경된 값을 가지고 있는 values를 통해서 생성된 프로메테우스 스택
확인
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po,svc,pv,pvc -n
monitoring
NAME
READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-1658-alertmanager-0
2/2 Running 0 93s
pod/kube-prometheus-stack-1658-operator-67699f6d8-429rd
1/1 Running 0 94s
pod/kube-prometheus-stack-1658961024-grafana-7d98b7d99f-65qjj
3/3 Running 0 94s
pod/kube-prometheus-stack-1658961024-kube-state-metrics-65f588z8msj
1/1 Running 0 94s
pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-5zlcd
1/1 Running 0 95s
pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-wt6kf
1/1 Running 0 94s
pod/prometheus-kube-prometheus-stack-1658-prometheus-0
2/2 Running 0 92s
NAME TYPE
CLUSTER-IP EXTERNAL-IP

PORT(S) AGE
service/alertmanager-operated
ClusterIP None <none>
9093/TCP,9094/TCP,9094/UDP 93s
service/kube-prometheus-stack-1658-alertmanager
ClusterIP 10.100.254.128 <none>
9093/TCP 95s
service/kube-prometheus-stack-1658-operator
443/TCP 95s
service/kube-prometheus-stack-1658-prometheus
LoadBalancer 10.100.209.143
afc7a705d6f094bf0bc142586bb70789-901394507.ap-northeast-1.elb.amazonaws.
com 9090:31388/TCP 95s
service/kube-prometheus-stack-1658961024-grafana
LoadBalancer 10.100.102.193
ad8ec0af13eb84ae780236b179981e29-967158055.ap-northeast-1.elb.amazonaws.
com 80:31050/TCP 95s
service/kube-prometheus-stack-1658961024-kube-state-metrics
8080/TCP 95s
service/kube-prometheus-stack-1658961024-prometheus-node-exporter
9100/TCP 95s
service/prometheus-operated
ClusterIP None <none>
9090/TCP 92s
NAME CAPACITY
ACCESS MODES RECLAIM POLICY STATUS CLAIM
STORAGECLASS REASON AGE
persistentvolume/pvc-59aa28bd-b9cf-4dc9-93b0-44674f8578bb 100Gi
RWO Delete Bound
monitoring/kube-prometheus-stack-1658961024-grafana
gp2 89s
persistentvolume/pvc-dac51bfd-ae87-47cf-891b-7f98a4f142a0 50Gi
RWO Delete Bound
monitoring/alertmanager-kube-prometheus-stack-1658-alertmanager-db-alert
manager-kube-prometheus-stack-1658-alertmanager-0 gp2
56m
persistentvolume/pvc-e1961736-9cbc-4ee8-9d6e-dcbd932afc6f 50Gi
RWO Delete Bound
monitoring/prometheus-kube-prometheus-stack-1658-prometheus-db-prometheu
s-kube-prometheus-stack-1658-prometheus-0 gp2
56m

NAME
STATUS VOLUME CAPACITY ACCESS
MODES STORAGECLASS AGE
persistentvolumeclaim/alertmanager-kube-prometheus-stack-1658-alertmanag
er-db-alertmanager-kube-prometheus-stack-1658-alertmanager-0 Bound
pvc-dac51bfd-ae87-47cf-891b-7f98a4f142a0 50Gi RWO gp2
56m
persistentvolumeclaim/kube-prometheus-stack-1658961024-grafana
Bound pvc-59aa28bd-b9cf-4dc9-93b0-44674f8578bb 100Gi RWO
gp2 95s
persistentvolumeclaim/prometheus-kube-prometheus-stack-1658-prometheus-d
b-prometheus-kube-prometheus-stack-1658-prometheus-0 Bound
pvc-e1961736-9cbc-4ee8-9d6e-dcbd932afc6f 50Gi RWO gp2
56m
레퍼런스:
https://1week.tistory.com/43
https://passwd.tistory.com/entry/Helm-kube-prometheus-stack-Grafana-Persistence-%ED%9
9%9C%EC%84%B1%ED%99%94
https://github.com/prometheus-community/helm-charts/issues/113

Prometheus on EKS

More Related Content

Prometheus on EKS