自动化监控运维(五) Alertmanager 基于 email 告警配置
一、 安装 Alertmanager
1.1 基于二进制安装
下载
https://github.com/prometheus/alertmanager/releases
解压
tar -xzvf alertmanager-0.26.0.linux-amd64.tar.gz
运行
./alertmanager --config.file="alertmanager.yml"
1.2 基于 docker 安装
docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager
二、 配置 Alertmanager
2.1 配置 Alertmanager 告警方式
- 编辑配置文件:
alertmanager.yml
字段需要注明端口
``` yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:25'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]' #
smtp_auth_password: 'yourpasswd/authenticCode'
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
2.2 配置 Prometheus
- 编辑配置文件
prometheus.yml
alerting: |
``` 用于存放告警规则
### 2.3 告警示例
``` vim /usr/local/prometheus/rules/first.rules.yml ```
``` yml
groups:
- name: cpuAlertGroup
rules:
- alert: hostCPUUsageTooHigh
expr: (1 - sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) / sum(increase(node_cpu_seconds_total[1m])) by (instance) ) * 100 > 50
for: 30s
labels:
biz_type: cpu_usage
annotations:
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 50% (current : {{ $value }})"
三、 测试
- 测试脚本
#!/usr/bin/python |