部署 BlackBox Exporter

Administrator
发布于 2024-12-16 / 82 阅读 / 0 评论 / 0 点赞

部署 BlackBox Exporter

部署 BlackBox Exporter

在使用 Kube-Prometheus 时,它自带了 blackbox-exporter,他会默认监测集群内部的情况,但是对于外部资源需要配置

ICMP 监测

首先部署 blackbox-exporter

kubectp apply -f blackboxExporter-clusterRoleBinding.yaml  
kubectp apply -f blackboxExporter-configuration.yaml  
kubectp apply -f blackboxExporter-serviceAccount.yaml  
kubectp apply -f blackboxExporter-service.yaml
kubectp apply -f blackboxExporter-clusterRole.yaml         
kubectp apply -f blackboxExporter-deployment.yaml     
kubectp apply -f blackboxExporter-serviceMonitor.yaml

这里以 ICMP 举例

观察 blackExporter-configuration.yaml 文件内容

apiVersion: v1
data:
  config.yml: |-
    "modules":
      "http_2xx":
        "http":
          "preferred_ip_protocol": "ip4"
        "prober": "http"
      "http_post_2xx":
        "http":
          "method": "POST"
          "preferred_ip_protocol": "ip4"
        "prober": "http"
      "irc_banner":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
          "query_response":
          - "send": "NICK prober"
          - "send": "USER prober prober prober :prober"
          - "expect": "PING :([^ ]+)"
            "send": "PONG ${1}"
          - "expect": "^:[^ ]+ 001"
      "pop3s_banner":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
          "query_response":
          - "expect": "^+OK"
          "tls": true
          "tls_config":
            "insecure_skip_verify": false
      "ssh_banner":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
          "query_response":
          - "expect": "^SSH-2.0-"
      "tcp_connect":
        "prober": "tcp"
        "tcp":
          "preferred_ip_protocol": "ip4"
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: blackbox-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.25.0
  name: blackbox-exporter-configuration
  namespace: monitoring

通过观察发现 configuration 存放了 blackbox 的 module 模块,但是没有配置 ICMP 模块,需要进行配置

apiVersion: v1
data:
  config.yml: |-
    "modules":
      "icmp":
        "prober": "icmp"
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: blackbox-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.25.0
  name: blackbox-exporter-configuration
  namespace: monitoring

配置监控项

blackbox 比较特殊,需要配置静态监控项目,通过创建 secret 挂载到 prometheus 配置文件里,这里需要数据 metrics_path 是 probe,这个是根据 deployment 里配置进行选择的,这个涉及到 TLS 加密,普通 metrics 是 https,probe 接口是 http

- job_name: "blackbox_icmp"
  metrics_path: /probe
  params:
    module: [icmp]
  static_configs:
    - targets:
      - 10.0.1.2
      labels:
        group: 'mysql'
    - targets:
      - 10.0.1.3
      - 10.0.1.4
      - 10.0.1.5
      labels:
        group: 'mongodb'
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: blackbox-exporter.monitoring.svc:19115

创建 secret

kubectl create secret generic external-node-configs --from-file=external-node.yaml -n monitoring

进行挂载到 Prometheus文件,添加 additionalScrapeConfigs 相关内容

[root@VM]# kubectl edit prometheus -n monitoring k8s
...
  namespace: monitoring
  resourceVersion: "45924480"
  uid: ab9b158c-8027-42a8-8f0f-acc2341d99de
spec:
  additionalScrapeConfigs:  // 添加内容
    key: external-node.yaml // 添加内容
    name: external-node-configs // 添加内容
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: alertmanager-main
      namespace: monitoring
      port: web
  enableFeatures: []
  evaluationInterval: 30s
  externalLabels: {}
  image: quay.io/prometheus/prometheus:v2.55.1

重启Prometheus

kubectl rollout restart statefulset prometheus-k8s  -n monitoring

HTTP 监测

# 使用 HTTP 检测,可以检查连通性与域名证书过期时间等
# 配置到 external-node.yaml 里
- job_name: "blackbox_http"
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
    - targets: 
      - https://docs.cloud.pixiuio.com/
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: blackbox-exporter.monitoring.svc:19115

ICMP 报错了

在发送 curl "10.0.1.186:19115/probe?target=10.0.1.130&module=icmp" 请求后返回指标,其中 probe_success 返回 0,就是执行检测失败。

...
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0

通过查看日志kubectl logs -f --tail 100 -n monitoring blackbox-exporter-59fb87f74-xzt26发现并没有报错信息,需要开启 debug 模式

...
containers:
- args:
  - --config.file=/etc/blackbox_exporter/config.yml
  - --web.listen-address=:19115
  - --log.level=debug
  image: quay.io/prometheus/blackbox-exporter:v0.23.0
  name: blackbox-exporter
  ports:
  - containerPort: 19115
    name: http

日志报错为权限不够问题

ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Beginning probe" probe=icmp timeout_seconds=5
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Resolving target address" target=10.0.1.130 ip_protocol=ip6
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Resolved target address" target=10.0.1.130 ip=10.0.1.130
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Creating socket"
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Unable to do unprivileged listen on socket, will attempt privileged" err="socket: permission denied"
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Error listening to socket" err="listen ip4:icmp 0.0.0.0: socket: operation not permitted"
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Probe failed" duration_seconds=0.000239295

经排查发现blackbox使用了capabilities能力,原因是icmp需要CAP_NET_RAW权限,具体可看官网,但是看deployment文件如下,说明了没有使用 root 执行

securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true
  runAsGroup: 65534
  runAsNonRoot: true
  runAsUser: 65534

解决方案 1

修改为 root权限

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  runAsGroup: 65534
  runAsNonRoot: fase
  runAsUser: 0

解决方案 2

重新构建镜像

# 先下载二进制包
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz

# 解压并进入
tar -xf blackbox_exporter-0.25.0.linux-amd64.tar.gz && cd blackbox_exporter-0.25.0.linux-amd64

# 给权限
setcap cap_net_raw+ep blackbox_exporter

# 编写 dockerfile 镜像
[root@kubernetes]# cat dockerfile 
FROM quay.io/prometheus/blackbox-exporter:v0.23.0
COPY ./blackbox_exporter /bin/blackbox_exporter

# 制作 dockerfile 镜像
docker build . -t blackbox:v1

或者

多阶段构建制作完成赋予权限

[root@kubernetes blackbox_exporter-0.23.0.linux-amd64]# cat dockerfile 
FROM quay.io/prometheus/blackbox-exporter:v0.23.0 AS first

FROM alpine:latest AS second

COPY --from=first /bin/blackbox_exporter /bin/blackbox_exporter

RUN echo "https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.21/main" > /etc/apk/repositories \
    && echo "https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.21/community" >> /etc/apk/repositories \
    && apk update \
    && apk add --no-cache libcap

RUN /usr/sbin/setcap cap_net_raw+ep /bin/blackbox_exporter

FROM quay.io/prometheus/blackbox-exporter:v0.23.0

RUN rm -rf /bin/blackbox_exporter

COPY --from=second /bin/blackbox_exporter /bin

FAQ

想要知道自己的二进制是否有capability,通过 getcap 命令查看

[root@kubernetes]# getcap blackbox_exporter 
blackbox_exporter = cap_net_raw+ep

image-nqox.png