Helm was used to deploy Prometheus and the Prometheus stack on an EKS cluster for monitoring purposes. This included deploying Prometheus, Grafana, Alertmanager and associated pods and services. Some key steps taken were adding the Prometheus chart repository, configuring storage classes, and accessing the deployed applications. Potential issues with default storage configurations were also discussed.
Report
Share
Report
Share
1 of 20
Download to read offline
More Related Content
Prometheus on EKS
1. Prometheus on EKS 가이드 문서
(https://docs.aws.amazon.com/ko_kr/eks/latest/userguide/prometheus.html)
📌QA test Region on (ap-northeast-1 / 도쿄)
https://github.com/sysnet4admin
2. Helm v3.9.1 설치
1.openssl 설치
[cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl -y
Loaded plugins: ovl, priorities
Resolving Dependencies
--> Running transaction check
---> Package openssl.x86_64 1:1.0.2k-24.amzn2.0.3 will be installed
--> Finished Dependency Resolution
<snipped>
Downloading packages:
openssl-1.0.2k-24.amzn2.0.3.x86_64.rpm
<snipped>
Installed:
openssl.x86_64 1:1.0.2k-24.amzn2.0.3
Complete!
2.helm binary 설치
[cloudshell-user@ip-10-0-146-72 ~]$ curl -fsSL -o get_helm.sh
https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
[cloudshell-user@ip-10-0-146-72 ~]$ chmod 700 get_helm.sh
[cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.9.1-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
3.helm을 실행 디렉토리 들로 옮김
[cloudshell-user@ip-10-0-46-136 ~]$ cp /usr/local/bin/helm
$HOME/bin/helm && export PATH=$PATH:$HOME/bin
4.설치된 helm 확인
[cloudshell-user@ip-10-0-146-72 ~]$ helm version
version.BuildInfo{Version:"v3.9.1",
GitCommit:"a7c043acb5ff905c261cfdc923a35776ba5e66e4",
GitTreeState:"clean", GoVersion:"go1.17.5"}
3. ❗만약 openssl이 설치되어 있지 않은 경우
[cloudshell-user@ip-10-0-146-72 ~]$ DESIRED_VERSION=v3.9.1 ./get_helm.sh
In order to verify checksum, openssl must first be installed.
Please install openssl or set VERIFY_CHECKSUM=false in your environment.
Failed to install helm
For support, go to https://github.com/helm/helm.
[cloudshell-user@ip-10-0-146-72 ~]$ sudo yum install openssl
4. 헬름을 통한 Prometheus 배포를 위한 사전 작업
1.프로메테우스 설치를 위한 헬름 레포를 추가
[cloudshell-user@ip-10-0-146-72 ~]$ helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories
2.레포에서 최신 내용을 받아 업데이트
[cloudshell-user@ip-10-0-146-72 ~]$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "prometheus-community" chart
repository
Update Complete. ⎈Happy Helming!⎈
3.사전 구성된 스토리지클래스 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 35m
5. Prometheus 배포
(https://awskrug.github.io/eks-workshop/monitoring/deploy-prometheus/)
1.헬름을 통해서 EKS에 프로메테우스 배포
[cloudshell-user@ip-10-0-146-72 ~]$ helm install prometheus
prometheus-community/prometheus
--set server.service.type="LoadBalancer"
--namespace=monitoring
--create-namespace
NAME: prometheus
LAST DEPLOYED: Fri Jul 15 05:36:50 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS
name from within your cluster:
prometheus-server.monitoring.svc.cluster.local
Get the Prometheus server URL by running these commands in the same
shell:
NOTE: It may take a few minutes for the LoadBalancer IP to be
available.
You can watch the status of by running 'kubectl get svc
--namespace monitoring -w prometheus-server'
export SERVICE_IP=$(kubectl get svc --namespace monitoring
prometheus-server -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:80
The Prometheus alertmanager can be accessed via port 80 on the following
DNS name from within your cluster:
prometheus-alertmanager.monitoring.svc.cluster.local
Get the Alertmanager URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus,component=alertmanager" -o
jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9093
6. ########################################################################
## WARNING: Pod Security Policy has been moved to a global property. ##
## use .Values.podSecurityPolicy.enabled with pod-based ##
## annotations ##
# (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #
########################################################################
The Prometheus PushGateway can be accessed via port 9091 on the
following DNS name from within your cluster:
prometheus-pushgateway.monitoring.svc.cluster.local
Get the PushGateway URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l
"app=prometheus,component=pushgateway" -o
jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9091
For more information on running Prometheus, visit:
https://prometheus.io/
❗만약 storageclass를 gp2가 아닌 EFS 또는 gp3로 쓰고 싶다면 다음의 참조하세요
helm install prometheus prometheus-community/prometheus
--set alertmanager.persistentVolume.storageClass="gp2"
--set server.persistentVolume.storageClass="gp2"
--set server.service.type="LoadBalancer"
--namespace=monitoring
--create-namespace
2.배포된 pods와 services 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-alertmanager-5c57cc6945-cqt2b 2/2 Running 0 5m40s
pod/prometheus-kube-state-metrics-77ddf69b4-68jg4 1/1 Running 0 5m40s
pod/prometheus-node-exporter-skndj 1/1 Running 0 5m40s
pod/prometheus-node-exporter-xw5fc 1/1 Running 0 5m40s
pod/prometheus-pushgateway-ff89cc976-bzxpv 1/1 Running 0 5m40s
pod/prometheus-server-6c99667b9b-6d958 2/2 Running 0 5m40s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/prometheus-alertmanager ClusterIP 10.100.161.159 <none>
80/TCP 5m40s
service/prometheus-kube-state-metrics ClusterIP 10.100.103.158 <none>
8. 4.조회된 메트릭 데이터 확인
5.배포된 프로메테우스 조회 및 삭제
[cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
prometheus monitoring 1 2022-07-15 05:36:50.109858421 +0000 UTC deployed prometheus-15.10.4 2.36.2
[cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall prometheus -n monitoring
release "prometheus" uninstalled
6.삭제된 프로메테우스 리소스 확인
[cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring
No resources found in monitoring namespace.
9. Prometheus stack 배포
1.헬름을 통해서 EKS에 프로메테우스 스택 배포
(https://kong.awsworkshop.io/eks-enterprise-setup/observability/prometheus.html)
[cloudshell-user@ip-10-0-146-72 ~]$ helm install kube-prometheus-stack
-n prometheus prometheus-community/kube-prometheus-stack
--set prometheus.service.type=LoadBalancer
--set grafana.service.type=LoadBalancer
--namespace=monitoring
--create-namespace
NAME: kube-prometheus-stack
LAST DEPLOYED: Fri Jul 15 05:01:05 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
2.배포된 pods와 services 확인
[cloudshell-user@ip-10-0-146-72 ~]$ kubectl get po,svc -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 3m56s
pod/kube-prometheus-stack-grafana-7dffb5648b-tmmjg 3/3 Running 0 4m6s
pod/kube-prometheus-stack-kube-state-metrics-668cff654f-cs6qb 1/1 Running 0 4m6s
pod/kube-prometheus-stack-operator-55d8668b46-c7q8g 1/1 Running 0 4m6s
pod/kube-prometheus-stack-prometheus-node-exporter-5mqr4 1/1 Running 0 4m6s
pod/kube-prometheus-stack-prometheus-node-exporter-zgw98 1/1 Running 0 4m6s
pod/prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 3m56s
NAME TYPE CLUSTER-IP EXTERNAL-IP
PORT(S) AGE
service/alertmanager-operated ClusterIP None <none>
9093/TCP,9094/TCP,9094/UDP 3m56s
service/kube-prometheus-stack-alertmanager ClusterIP 10.100.31.138 <none>
9093/TCP 4m6s
service/kube-prometheus-stack-grafana LoadBalancer 10.100.8.4
a0351049fd68c4164a4198923a18ddb9-867621865.ap-northeast-1.elb.amazonaws.com 80:30337/TCP
4m6s
10. service/kube-prometheus-stack-kube-state-metrics ClusterIP 10.100.115.224 <none>
8080/TCP 4m6s
service/kube-prometheus-stack-operator ClusterIP 10.100.154.42 <none>
443/TCP 4m6s
service/kube-prometheus-stack-prometheus LoadBalancer 10.100.52.126
a315d03620fb94d89849c9ea36e12b3c-707046690.ap-northeast-1.elb.amazonaws.com 9090:31885/TCP
4m6s
service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.100.109.235 <none>
9100/TCP 4m6s
service/prometheus-operated ClusterIP None <none>
9090/TCP 3m56s
❗현재 프로메테우스 스택의 큰 문제점 ?
프로메테우스 배포에는 다음과 같이 default로 storageclass(gp2)를 통해서 pv와 pvc가
생성됩니다.
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv -n monitoring
NAME CAPACITY ACCESS MODES RECLAIM POLICY
STATUS CLAIM STORAGECLASS REASON AGE
pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi RWO Delete
Bound monitoring/prometheus-alertmanager gp2 4m7s
pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi RWO Delete
Bound monitoring/prometheus-server gp2 4m7s
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY
ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Bound pvc-39a11fbb-467b-4ee7-b6c0-20eb1536282b 2Gi
RWO gp2 4m22s
prometheus-server Bound pvc-c231b71d-7d04-42dc-b276-61769c6f9ee0 8Gi
RWO gp2 4m22s
그러나 프로메테우스 스택에서 storageclass를 지정해 주지 않으면 다음과 같이 pv,pvc를
이용하는 것이 아니라 emptyDir를 이용해서 임시로만 사용하도록 배포 됩니다.
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get pv,pvc
No resources found
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po -n monitoring
prometheus-kube-prometheus-stack-prometheus-0 -o yaml | grep volumes
-A30
volumes:
- name: config
secret:
defaultMode: 420
11. secretName: prometheus-kube-prometheus-stack-prometheus
- name: tls-assets
projected:
defaultMode: 420
sources:
- secret:
name: prometheus-kube-prometheus-stack-prometheus-tls-assets-0
- emptyDir: {}
name: config-out
- configMap:
defaultMode: 420
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
name: prometheus-kube-prometheus-stack-prometheus-rulefiles-0
<snipped>
따라서 현업 관점에서는 storageclass가 사용되도록 설정을 해줘야 하며, 이는
value.yaml을 통해서 추가 설정 배포 되어야 합니다. (또는 차트를 fork하고 새로 고쳐야함)
이는 다음의 링크를 참조하시기 바랍니다.
프로메테우스: https://github.com/prometheus-community/helm-charts/issues/186
그라파나: https://github.com/prometheus-community/helm-charts/issues/436
헬름value관련:
https://helm.sh/docs/intro/using_helm/#customizing-the-chart-before-installing
만약 정말하고 싶다면….부록1을 참고하세요
3.배포된 프로메테우스 확인
13. 4.배포된 그라파나 확인 및 로그인
ID: admin
Password: prom-operator
5.미리 설정된 데이터 소스가 프로메테우스인지 확인
14. 6. 미리 만들어진 대시보드를 불러오기 위해 13770을 import 메뉴에서
입력
7.Data Source를 프로메테우스로 선택하고 import 누름
15. 8.import 된 13770을 감상
9.(필요시) 배포된 프로메테우스 스택 조회 및 삭제
[cloudshell-user@ip-10-0-146-72 ~]$ helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
kube-prometheus-stack monitoring 1 2022-07-15 05:01:05.881146977 +0000 UTC deployed kube-prometheus-stack-37.2.0 0.57.0
[cloudshell-user@ip-10-0-146-72 ~]$ helm uninstall -n monitoring
kube-prometheus-stack
release "kube-prometheus-stack" uninstalled
16. 부록1
1.helm inspect로 values 파일 생성
$ helm inspect values prometheus-community/kube-prometheus-stack
--version 38.0.2 > kube-prometheus-stack-38.0.2.values
2. 생성된 values 파일에 필요 내용 추가 및 수정
라인 번호는 수정 순서에 따라 다소 차이가 있을 수도 있습니다.
참고로 라인 번호는 vi 실행 이후에 :set nu로 표시할 수 있습니다.
수정
542 ## Storage is the definition of how storage will be used by the
Alertmanager instances.
543 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
544 ##
545 storage:
546 volumeClaimTemplate:
547 spec:
548 storageClassName: gp2
549 accessModes: ["ReadWriteOnce"]
550 resources:
551 requests:
552 storage: 50Gi
553 # selector: {}
추가
697 ## Using default values from
https://github.com/grafana/helm-charts/blob/main/charts/grafana/values.y
aml
698 ##
699 grafana:
700 enabled: true
701 namespaceOverride: ""
702
703 # override configuration by hoon
704 persistence:
705 enabled: true
706 type: pvc
17. 707 storageClassName: gp2
708 accessModes:
709 - ReadWriteOnce
710 size: 100Gi
711 finalizers:
712 - kubernetes.io/pvc-protection
수정
726 ## Timezone for the default dashboards
727 ## Other options are: browser or a specific timezone, i.e.
Europe/Luxembourg
728 ##
729 defaultDashboardsTimezone: utc
730
731 adminPassword: admin
732
수정
2580 ## Prometheus StorageSpec for persistent data
2581 ## ref:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Doc
umentation/user-guides/storage.md
2582 ##
2583 storageSpec:
2584 ## Using PersistentVolumeClaim
2585 ##
2586 volumeClaimTemplate:
2587 spec:
2588 storageClassName: gp2
2589 accessModes: ["ReadWriteOnce"]
2590 resources:
2591 requests:
2592 storage: 50Gi
2593 # selector: {}
3.helm install 실행
[cloudshell-user@ip-10-0-6-163 ~]$ helm install
prometheus-community/kube-prometheus-stack
18. --set prometheus.service.type=LoadBalancer
--set grafana.service.type=LoadBalancer
--create-namespace
--namespace monitoring
--generate-name
--values kube-prometheus-stack-38.0.2.values
NAME: kube-prometheus-stack-1658960026
LAST DEPLOYED: Wed Jul 27 22:13:48 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l
"release=kube-prometheus-stack-1658960026"
Visit https://github.com/prometheus-operator/kube-prometheus for
instructions on how to create & configure Alertmanager and Prometheus
instances using the Operator.
4.변경된 값을 가지고 있는 values를 통해서 생성된 프로메테우스 스택
확인
[cloudshell-user@ip-10-0-6-163 ~]$ kubectl get po,svc,pv,pvc -n
monitoring
NAME
READY STATUS RESTARTS AGE
pod/alertmanager-kube-prometheus-stack-1658-alertmanager-0
2/2 Running 0 93s
pod/kube-prometheus-stack-1658-operator-67699f6d8-429rd
1/1 Running 0 94s
pod/kube-prometheus-stack-1658961024-grafana-7d98b7d99f-65qjj
3/3 Running 0 94s
pod/kube-prometheus-stack-1658961024-kube-state-metrics-65f588z8msj
1/1 Running 0 94s
pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-5zlcd
1/1 Running 0 95s
pod/kube-prometheus-stack-1658961024-prometheus-node-exporter-wt6kf
1/1 Running 0 94s
pod/prometheus-kube-prometheus-stack-1658-prometheus-0
2/2 Running 0 92s
NAME TYPE
CLUSTER-IP EXTERNAL-IP