Kube-Prometheus使用
2024-10-28
Kube-Prometheus 部署 Prometheus
Kube-Prometheus 是 Kubernetes 平台的下快速部署 Prometheus 监控的工具,它与 helm 都是 Prometheus-Operator 下的项目
下载 Git 代码
$ git clone https://github.com/prometheus-operator/kube-prometheus.git
$ cd kube-prometheus
分析部署内容
官网提示部署只需要 apply 就行
kubectl apply --server-side -f manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply -f manifests/
观察一下 manifests 内容
[root@kubernetes kube-prometheus]# tree manifests/
manifests/
├── alertmanager-alertmanager.yaml
....
├── prometheus-serviceMonitor.yaml
├── prometheus-service.yaml
└── setup
...
├── 0thanosrulerCustomResourceDefinition.yaml
└── namespace.yaml
发现内容主要有一下内容
- alertmanager # 这个是 operator 的声明式文件,不直接用镜像创建 alertmanager
- prometheus ## 这个是 operator 的声明式文件,不直接用镜像创建 prometheus
- grafana ## 这个是直接 deployment 的方式部署
- blackbox ## 黑盒监控,监控 TCP、HTTP 等
- kube-state-metrics ## state-metrics 监控 kubelet 等
- nodeExporter ## node exporter 监控 node 节点
- prometheusAdapter ## 这个 Adapter 是用来做 HPA 动态扩缩容,可以使用 Prometheus 指标
- prometheus-Operator ## prometheus Operator 有 Prometheus CRD 与 Alertmanager CRD。 分别用来创建 Prometheus、Alertmanager
- kubernetesControlPlane-*.yaml ## 监控各种平台,如 apiserver、kubelet、kubecontroller 等
- setup ## 初始 rbac 权限,包括 namespace 创建
所以可以根据需求 apply 部分内容
prometheusrules 与 servicemonitors
// 使用 ServiceMonitor 可以实现 Prometheus Target 动态发现,它是 watch .spec 下的 namespace 与 selector 去找对应的 svc 服务。
// svc 跟 ep 绑定是通过名称实现的,svc 名称与 ep 名称完全一致
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.8.2
name: node-exporter
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 15s
port: https
relabelings:
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: app.kubernetes.io/name
selector:
matchLabels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
// PrometeusRule 可以创建告警规则跟记录规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.8.2
prometheus: k8s
role: alert-rules
name: node-exporter-rules
namespace: monitoring
spec:
groups:
- name: node-exporter
rules:
- alert: NodeFilesystemSpaceFillingUp
annotations:
description: Filesystem on {{ $labels.device }}, mounted on {{ $labels.mountpoint
}}, at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available
space left and is filling up.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup
summary: Filesystem is predicted to run out of space within the next 24 hours.
expr: |
(
node_filesystem_avail_bytes{job="node-exporter",fstype!="",mountpoint!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!="",mountpoint!=""} * 100 < 15
and
predict_linear(node_filesystem_avail_bytes{job="node-exporter",fstype!="",mountpoint!=""}[6h], 24*60*60) < 0
and
node_filesystem_readonly{job="node-exporter",fstype!="",mountpoint!=""} == 0
)
for: 1h
labels:
severity: warning
- name: node-exporter.rules
rules:
- expr: |
count without (cpu, mode) (
node_cpu_seconds_total{job="node-exporter",mode="idle"}
)
record: instance:node_num_cpu:sum