貔貅云原生

貔貅云原生

Kube-Prometheus使用

17
0
0
2024-10-28
Kube-Prometheus使用

Kube-Prometheus 部署 Prometheus

Kube-Prometheus 是 Kubernetes 平台的下快速部署 Prometheus 监控的工具,它与 helm 都是 Prometheus-Operator 下的项目

下载 Git 代码

$ git clone https://github.com/prometheus-operator/kube-prometheus.git

$ cd kube-prometheus

分析部署内容

官网提示部署只需要 apply 就行

kubectl apply --server-side -f manifests/setup
kubectl wait \
        --for condition=Established \
        --all CustomResourceDefinition \
        --namespace=monitoring
kubectl apply -f manifests/

观察一下 manifests 内容

[root@kubernetes kube-prometheus]# tree manifests/
manifests/
├── alertmanager-alertmanager.yaml
....
├── prometheus-serviceMonitor.yaml
├── prometheus-service.yaml
└── setup
    ...
    ├── 0thanosrulerCustomResourceDefinition.yaml
    └── namespace.yaml

发现内容主要有一下内容

- alertmanager # 这个是 operator 的声明式文件,不直接用镜像创建 alertmanager
- prometheus ## 这个是 operator 的声明式文件,不直接用镜像创建 prometheus
- grafana ## 这个是直接 deployment 的方式部署
- blackbox ## 黑盒监控,监控 TCP、HTTP 等
- kube-state-metrics  ## state-metrics 监控 kubelet 等
- nodeExporter ## node exporter 监控 node 节点
- prometheusAdapter ## 这个 Adapter 是用来做 HPA 动态扩缩容,可以使用 Prometheus 指标
- prometheus-Operator ## prometheus Operator 有 Prometheus CRD 与 Alertmanager CRD。 分别用来创建 Prometheus、Alertmanager
- kubernetesControlPlane-*.yaml ## 监控各种平台,如 apiserver、kubelet、kubecontroller 等
- setup ## 初始 rbac 权限,包括 namespace 创建

所以可以根据需求 apply 部分内容

prometheusrules 与 servicemonitors

// 使用 ServiceMonitor 可以实现 Prometheus Target 动态发现,它是 watch .spec 下的 namespace 与 selector 去找对应的 svc 服务。
// svc 跟 ep 绑定是通过名称实现的,svc 名称与 ep 名称完全一致
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: node-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 1.8.2
  name: node-exporter
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 15s
    port: https
    relabelings:
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: app.kubernetes.io/name
  selector:
    matchLabels:
      app.kubernetes.io/component: exporter
      app.kubernetes.io/name: node-exporter
      app.kubernetes.io/part-of: kube-prometheus
// PrometeusRule 可以创建告警规则跟记录规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: node-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 1.8.2
    prometheus: k8s
    role: alert-rules
  name: node-exporter-rules
  namespace: monitoring
spec:
  groups:
  - name: node-exporter
    rules:
    - alert: NodeFilesystemSpaceFillingUp
      annotations:
        description: Filesystem on {{ $labels.device }}, mounted on {{ $labels.mountpoint
          }}, at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available
          space left and is filling up.
        runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodefilesystemspacefillingup
        summary: Filesystem is predicted to run out of space within the next 24 hours.
      expr: |
        (
          node_filesystem_avail_bytes{job="node-exporter",fstype!="",mountpoint!=""} / node_filesystem_size_bytes{job="node-exporter",fstype!="",mountpoint!=""} * 100 < 15
        and
          predict_linear(node_filesystem_avail_bytes{job="node-exporter",fstype!="",mountpoint!=""}[6h], 24*60*60) < 0
        and
          node_filesystem_readonly{job="node-exporter",fstype!="",mountpoint!=""} == 0
        )
      for: 1h
      labels:
        severity: warning
  - name: node-exporter.rules
    rules:
    - expr: |
        count without (cpu, mode) (
          node_cpu_seconds_total{job="node-exporter",mode="idle"}
        )
      record: instance:node_num_cpu:sum