部署 BlackBox Exporter
在使用 Kube-Prometheus 时,它自带了 blackbox-exporter,他会默认监测集群内部的情况,但是对于外部资源需要配置
ICMP 监测
首先部署 blackbox-exporter
kubectp apply -f blackboxExporter-clusterRoleBinding.yaml
kubectp apply -f blackboxExporter-configuration.yaml
kubectp apply -f blackboxExporter-serviceAccount.yaml
kubectp apply -f blackboxExporter-service.yaml
kubectp apply -f blackboxExporter-clusterRole.yaml
kubectp apply -f blackboxExporter-deployment.yaml
kubectp apply -f blackboxExporter-serviceMonitor.yaml
这里以 ICMP 举例
观察 blackExporter-configuration.yaml 文件内容
apiVersion: v1
data:
config.yml: |-
"modules":
"http_2xx":
"http":
"preferred_ip_protocol": "ip4"
"prober": "http"
"http_post_2xx":
"http":
"method": "POST"
"preferred_ip_protocol": "ip4"
"prober": "http"
"irc_banner":
"prober": "tcp"
"tcp":
"preferred_ip_protocol": "ip4"
"query_response":
- "send": "NICK prober"
- "send": "USER prober prober prober :prober"
- "expect": "PING :([^ ]+)"
"send": "PONG ${1}"
- "expect": "^:[^ ]+ 001"
"pop3s_banner":
"prober": "tcp"
"tcp":
"preferred_ip_protocol": "ip4"
"query_response":
- "expect": "^+OK"
"tls": true
"tls_config":
"insecure_skip_verify": false
"ssh_banner":
"prober": "tcp"
"tcp":
"preferred_ip_protocol": "ip4"
"query_response":
- "expect": "^SSH-2.0-"
"tcp_connect":
"prober": "tcp"
"tcp":
"preferred_ip_protocol": "ip4"
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.25.0
name: blackbox-exporter-configuration
namespace: monitoring
通过观察发现 configuration 存放了 blackbox 的 module 模块,但是没有配置 ICMP 模块,需要进行配置
apiVersion: v1
data:
config.yml: |-
"modules":
"icmp":
"prober": "icmp"
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.25.0
name: blackbox-exporter-configuration
namespace: monitoring
配置监控项
blackbox 比较特殊,需要配置静态监控项目,通过创建 secret 挂载到 prometheus 配置文件里,这里需要数据 metrics_path 是 probe,这个是根据 deployment 里配置进行选择的,这个涉及到 TLS 加密,普通 metrics 是 https,probe 接口是 http
- job_name: "blackbox_icmp"
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets:
- 10.0.1.2
labels:
group: 'mysql'
- targets:
- 10.0.1.3
- 10.0.1.4
- 10.0.1.5
labels:
group: 'mongodb'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter.monitoring.svc:19115
创建 secret
kubectl create secret generic external-node-configs --from-file=external-node.yaml -n monitoring
进行挂载到 Prometheus文件,添加 additionalScrapeConfigs 相关内容
[root@VM]# kubectl edit prometheus -n monitoring k8s
...
namespace: monitoring
resourceVersion: "45924480"
uid: ab9b158c-8027-42a8-8f0f-acc2341d99de
spec:
additionalScrapeConfigs: // 添加内容
key: external-node.yaml // 添加内容
name: external-node-configs // 添加内容
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
enableFeatures: []
evaluationInterval: 30s
externalLabels: {}
image: quay.io/prometheus/prometheus:v2.55.1
重启Prometheus
kubectl rollout restart statefulset prometheus-k8s -n monitoring
HTTP 监测
# 使用 HTTP 检测,可以检查连通性与域名证书过期时间等
# 配置到 external-node.yaml 里
- job_name: "blackbox_http"
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://docs.cloud.pixiuio.com/
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter.monitoring.svc:19115
ICMP 报错了
在发送 curl "10.0.1.186:19115/probe?target=10.0.1.130&module=icmp"
请求后返回指标,其中 probe_success 返回 0,就是执行检测失败。
...
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0
通过查看日志kubectl logs -f --tail 100 -n monitoring blackbox-exporter-59fb87f74-xzt26
发现并没有报错信息,需要开启 debug 模式
...
containers:
- args:
- --config.file=/etc/blackbox_exporter/config.yml
- --web.listen-address=:19115
- --log.level=debug
image: quay.io/prometheus/blackbox-exporter:v0.23.0
name: blackbox-exporter
ports:
- containerPort: 19115
name: http
日志报错为权限不够问题
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Beginning probe" probe=icmp timeout_seconds=5
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Resolving target address" target=10.0.1.130 ip_protocol=ip6
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Resolved target address" target=10.0.1.130 ip=10.0.1.130
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Creating socket"
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Unable to do unprivileged listen on socket, will attempt privileged" err="socket: permission denied"
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Error listening to socket" err="listen ip4:icmp 0.0.0.0: socket: operation not permitted"
ts=2024-12-17T08:29:54.303Z caller=handler.go:184 module=icmp target=10.0.1.130 level=debug msg="Probe failed" duration_seconds=0.000239295
经排查发现blackbox
使用了capabilities
能力,原因是icmp
需要CAP_NET_RAW
权限,具体可看官网,但是看deployment
文件如下,说明了没有使用 root 执行
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
解决方案 1
修改为 root权限
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsGroup: 65534
runAsNonRoot: fase
runAsUser: 0
解决方案 2
重新构建镜像
# 先下载二进制包
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
# 解压并进入
tar -xf blackbox_exporter-0.25.0.linux-amd64.tar.gz && cd blackbox_exporter-0.25.0.linux-amd64
# 给权限
setcap cap_net_raw+ep blackbox_exporter
# 编写 dockerfile 镜像
[root@kubernetes]# cat dockerfile
FROM quay.io/prometheus/blackbox-exporter:v0.23.0
COPY ./blackbox_exporter /bin/blackbox_exporter
# 制作 dockerfile 镜像
docker build . -t blackbox:v1
或者
多阶段构建制作完成赋予权限
[root@kubernetes blackbox_exporter-0.23.0.linux-amd64]# cat dockerfile
FROM quay.io/prometheus/blackbox-exporter:v0.23.0 AS first
FROM alpine:latest AS second
COPY --from=first /bin/blackbox_exporter /bin/blackbox_exporter
RUN echo "https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.21/main" > /etc/apk/repositories \
&& echo "https://mirrors.tuna.tsinghua.edu.cn/alpine/v3.21/community" >> /etc/apk/repositories \
&& apk update \
&& apk add --no-cache libcap
RUN /usr/sbin/setcap cap_net_raw+ep /bin/blackbox_exporter
FROM quay.io/prometheus/blackbox-exporter:v0.23.0
RUN rm -rf /bin/blackbox_exporter
COPY --from=second /bin/blackbox_exporter /bin
FAQ
想要知道自己的二进制是否有capability,通过 getcap 命令查看
[root@kubernetes]# getcap blackbox_exporter
blackbox_exporter = cap_net_raw+ep