https://hub.docker.com/r/prom/prometheus/
https://github.com/starsliao/Prometheus
https://gitee.com/feiyu563/PrometheusAlert
# run
docker run -d --net=host --name prometheus --restart=always -v /data/site/docker/env/monitor/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro -v /data/site/docker/env/monitor/prometheus/rules.yml:/etc/prometheus/rules.yml:ro -v /data/site/docker/env/monitor/prometheus/files:/etc/prometheus/files:ro -v /etc/localtime:/etc/localtime:ro prom/prometheus:latest
docker run -d --net=host --name node --restart=always -v /etc/localtime:/etc/localtime:ro prom/node-exporter:latest
docker run -d --net=host --name node --restart=always -v /etc/localtime:/etc/localtime:ro hub.htmltoo.com:5000/http:node
docker run -d --net=host --name blackbox --restart=always -v /etc/localtime:/etc/localtime:ro prom/blackbox-exporter:latest
docker commit -m="update" -a="htmltoo.com" node hub.htmltoo.com:5000/http:node
docker push hub.htmltoo.com:5000/http:node
应用: monitor -> 添加服务: prometheus
镜像: prom/prometheus:latest
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
/data/docker/monitor/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
/data/docker/monitor/prometheus/rules.yml:/etc/prometheus/rules.yml:ro
/data/docker/monitor/prometheus/files:/etc/prometheus/files:ro
端口:9090-9090
chmod -R 777 /data/docker/monitor/prometheus/
应用: monitor -> 添加服务: mysqld-exporter # https://hub.docker.com/r/prom/mysqld-exporter
镜像: prom/mysqld-exporter:latest
环境变量:
DATA_SOURCE_NAME = root:wdqdmm@r@(file.htmltoo.com:3306)/
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
端口:9104-9104
应用: monitor -> 添加服务: redis-exporter # https://hub.docker.com/r/oliver006/redis_exporter
镜像: oliver006/redis_exporter:latest
环境变量:
REDIS_ADDR = redis://file.htmltoo.com:6379
REDIS_PASSWORD = wdqdmm@r
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
端口:9121-9121
应用: monitor -> 添加服务: consul-exporter # https://hub.docker.com/r/prom/consul-exporter
镜像: prom/consul-exporter:latest
变量:
consul.server = consul:8500
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
(端口:9107-9107)
应用: monitor -> 添加服务: consul2-exporter # https://hub.docker.com/r/prom/consul-exporter
镜像: prom/consul-exporter:latest
变量:
consul.server = consul2:8500
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
端口:9107-9107
应用: monitor -> 添加服务: consul3-exporter # https://hub.docker.com/r/prom/consul-exporter
镜像: prom/consul-exporter:latest
变量:
consul.server = consul3:8500
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
端口:9107-9107
应用: monitor -> 添加服务: kafka-exporter # https://hub.docker.com/r/danielqsj/kafka-exporter
镜像: danielqsj/kafka-exporter:latest
命令:
--kafka.server=kafka:9092 [--kafka.server=another-server ...]
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
端口:9308-9308
应用: monitor -> 添加服务: greenplum_exporter # https://hub.docker.com/r/inrgihc/greenplum_exporter
https://github.com/tangyibo/greenplum_exporter
镜像: danielqsj/kafka-exporter:latest
变量:
GPDB_DATA_SOURCE_URL=postgres://gpadmin:password@10.17.20.11:5432/postgres?sslmode=disable
卷:
/etc/localtime:/etc/localtime:ro
/data/file:/data/file
端口:9297-9297
node_exporter - 宿主机添加, 使用端口:9100 https://hub.docker.com/r/prom/node-exporter
# https://github.com/prometheus/node_exporter/releases
cd /data/site/go/htmltoo.ssh/tools/soft/src/common/monitor
wget https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
# tar xvf node_exporter-1.0.1.linux-amd64.tar.gz
cd /opt
wget https://abc.htmltoo.com/:7777/src/common/monitor/node_exporter-1.0.1.linux-amd64.tar.gz
tar xvf node_exporter-1.0.1.linux-amd64.tar.gz
cd node_exporter-1.0.1.linux-amd64
mv node_exporter ../ && cd ../
rm -rf node_exporter-1.0.1.linux-amd64 node_exporter-1.0.1.linux-amd64.tar.gz
chmod +x /opt/node_exporter
nohup /opt/node_exporter & # >test.log 2>&1 &
---“nohup” 表示程序不被挂起, “test.log”表示输出的日志文件, “2>&1”表示将标准错误输出转变化标准输出
----“>”表示将打印信息重定向到日志文件, 最后一个“&”表示后台运行程序
# nginx-vts-exporter
https://hub.docker.com/r/sophos/nginx-vts-exporter
https://github.com/hnlq715/nginx-vts-exporter/
docker run -ti --rm --env NGINX_STATUS="http://localhost/status/format/json" sophos/nginx-vts-exporter
https://grafana.com/dashboards/2949
https://hub.docker.com/r/sophos/nginx-prometheus-metrics
docker run -d --rm -it -p 80:80 -p 1314:1314 -p 9527:9527 sophos/nginx-prometheus-metrics
Visit http://localhost:1314 to generate some test metrics.
visit http://localhost:9527/metrics in your browser(safari/chrome).
-将nginx-module-vts监控数据转换成prometheus采集的数据
wget https://github.com/hnlq715/nginx-vts-exporter/archive/refs/tags/v0.10.7.tar.gz
tar -xf v0.10.7.tar.gz && cd nginx-vts-exporter-0.10.7
./nginx-vts-exporter -nginx.scrape_timeout 10 -nginx.scrape_uri http://127.0.0.1:9912/status/format/json
-显示监控数据
http://g.htmltoo.com:9913/metrics
-配置Prometheus
- job_name: 'nginx-vts-exporter' metrics_path: '/metrics' static_configs: - targets: ['g.htmltoo.com:9913']
-配置Grafana模块
import grafana.com : 2949 load
# haproxy_exporter
https://hub.docker.com/r/prom/haproxy-exporter
https://github.com/prometheus/haproxy_exporter
# node_exporter
https://hub.docker.com/r/prom/node-exporter
https://www.github.com/prometheus/node_exporter
# influxdb-exporter
https://hub.docker.com/r/prom/influxdb-exporter
# clickhouse_exporter
https://github.com/ClickHouse/clickhouse_exporter
https://hub.docker.com/r/flant/clickhouse-exporter
https://www.github.com/flant/clickhouse_exporter
# blackbox-exporter: 监控端口状态
https://hub.docker.com/r/prom/blackbox-exporter
https://github.com/prometheus/blackbox_exporter/releases
---监控端口状态
params:
module: [tcp_connect]
---监控域名示例
params:
module: [http_2xx]
---ping检测
params:
module: [icmp]
---POST测试
params:
module: [http_post_2xx_query]
# memcached-exporter
https://hub.docker.com/r/prom/memcached-exporter
=========================================================
自动发现机制方便我们在监控系统中动态的添加或者删除资源。比如zabbix可以自动发现监控主机以及监控资源。而prometheus作为一个可以与zabbix旗鼓相当的监控系统,自然也有它的自动发现机制。
file_sd_configs可以用来动态的添加和删除target。
配置
修改prometheus的配置文件
在scrape_configs下面添加如下配置
- job_name: 'test_server'
file_sd_configs:
- files:
- /app/hananmin/prometheus/file_sd/test_server.json
refresh_interval: 10s
files表示文件的路径,文件的内容格式是yaml或者json格式,可以用通配符比如*.json。prometheus或定期扫描这些文件,并加载新配置。refresh_interval定义扫描的时间间隔。
创建被扫描的文件test_server.json
[
{
"targets": ["10.161.4.63:9091","10.161.4.61:9100"]
}
]
重新加载prometheus的配置
如果间隔时间短的话应该能立刻发现你新加的target。
=========================================================
Prometheus内置了一个web界面,我们可通过http://monitor_host:9090进行访问.
在Status->Targets页面下,我们可以看到我们配置的两个Target,它们的State为UNKNOW.
1.下一步我们需要安装并运行exporter,在被监控端服务器安装Docker。
a.安装运行node_exporter
被监控安装GO环境:
yum install go -y apt-get install golang # 查看go版本: go version
https://github.com/prometheus/node_exporter/releases
cd /data/file/soft/src/monitor wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
安装运行node_exporter:
tar xvf node_exporter-0.18.1.linux-amd64.tar.gz mkdir /usr/local/monitor mv node_exporter-0.18.1.linux-amd64/node_exporter /usr/local/monitor chmod +x /usr/local/monitor/node_exporter nohup /usr/local/monitor/node_exporter &
# docker下安装: docker run -d \ -p 9100:9100 \ --name node-exporter \ -v "/proc:/host/proc" \ -v "/sys:/host/sys" \ -v "/:/rootfs" \ --net="host" \ quay.io/prometheus/node-exporter \ -collector.procfs /host/proc \ -collector.sysfs /host/sys \ -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"
b.安装运行mysqld_exporter
https://github.com/prometheus/mysqld_exporter/releases
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gz
安装运行mysqld_exporter:
tar xvf mysqld_exporter-0.11.0.linux-amd64.tar.gz -C /usr/local/ nohup /usr/local/mysqld_exporter-0.11.0.linux-amd64/node_exporter &
mysqld_exporter需要连接到MySQL,所以需要MySQL的权限,我们先为它创建用户并赋予所需的权限。
mysql> CREATE USER 'mysql_monitor'@'localhost' IDENTIFIED BY 'mysql_monitor' WITH MAX_USER_CONNECTIONS 3; mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_monitor'@'localhost'; mysql> FLUSH PRIVILEGES;
# Docker下安装: 传入用户名和密码以及主机IP和端口。 docker run -d \ -p 9104:9104 \ -e DATA_SOURCE_NAME="mysql_monitor:mysql_monitor@(10.10.0.186:3306)/" prom/mysqld-exporter:latest 变量: DATA_SOURCE_NAME = ihunter:m@(209.cndo.org:3306)/
我们再次回到Status->Targets页面,可以看到两个Target的状态已经变成UP了。
接下来就是加入Grafana作为Prometheus的Dashboard,
2.监控之Mysql
Grafana目前官方还没有的配置好的MySQL图表模板,这里使用Percona开源的模板。
https://github.com/percona/grafana-dashboards/releases
下载,导入MySQL_Overview模板, 导入.
3.监控之Redis
https://github.com/oliver006/redis_exporter/releases/
wget https://github.com/oliver006/redis_exporter/releases/download/v0.20.2/redis_exporter-v0.20.2.linux-amd64.tar.gz tar -xvf redis_exporter-v0.20.2.linux-amd64.tar.gz
#docker下安装: https://github.com/oliver006/redis_exporter docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter:latest 变量: REDIS_ADDR = 209.cndo.org:6379
下载grafana的redis的prometheus-redis_rev1.json模板:
wget https://grafana.com/api/dashboards/763/revisions/1/download
在grafana中导入json模板.
# 启动redis_exporter: ## 无密码 ./redis_exporter redis//192.168.1.120:6379 & ## 有密码 redis_exporter -redis.addr 192.168.1.120:6379 -redis.password 123456
prometheus.yml加入redis节点,然后重启prometheus:
- job_name: redis static_configs: - targets: ['192.168.1.120:9121'] labels: instance: redis120
4.配置:
vi /data/site/docker/env/monitor/prometheus/prometheus.yml
global: scrape_interval: 60s evaluation_interval: 60s alerting: alertmanagers: - static_configs: - targets: [ '127.0.0.1:9093'] rule_files: #指定报警规则文件 - "rules.yml" scrape_configs: - job_name: prometheus static_configs: - targets: ['127.0.0.1:9090'] labels: instance: prometheus - job_name: 'server' file_sd_configs: - files: - /etc/prometheus/files/server.json refresh_interval: 10s - job_name: 'blackbox' metrics_path: /probe params: module: [tcp_connect] file_sd_configs: - files: - /etc/prometheus/files/check.json refresh_interval: 10s relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port. - job_name: 'db' file_sd_configs: - files: - /etc/prometheus/files/db.json refresh_interval: 10s # - job_name: 'consul' # file_sd_configs: # - files: # - /etc/prometheus/files/consul.json # refresh_interval: 10s # - job_name: 'cadvisor' # file_sd_configs: # - files: # - /etc/prometheus/files/cadvisor.json # refresh_interval: 10s
vi /data/site/docker/env/monitor/prometheus/rules.yml
groups:
- name: services
rules:
- alert: port is down
expr: probe_success{job=~"check_port_status"} == 0
for: 1m
labels:
severity: 3
annotations:
summery: "当前值为: {{ $value }}"
console: '主机 {{ $labels.hostname }}, 端口异常!'
- name: up/down #定义规则组
rules:
- alert: 实例停止运行 #定义报警名称
expr: up == 0 #Promql语句,触发规则
for: 30s # 1m-1分钟,1s-1秒 # 告警持续时间,超过这个时间才会发送给alertmanager
labels: #标签定义报警的级别和主机
name: instance
severity: Critical
annotations: #注解
summary: " [{{ $labels.job }}], 已停止运行! " #报警摘要,取报警信息的appname名称
description: " [{{ $labels.job }}], 检测到异常停止!超过30分钟, 请重点关注!!! " #报警信息
value: "{{ $value }}%" # 当前报警状态值
- name: Host
rules:
- alert: 内存使用
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 80
for: 30s
labels:
name: Memory
severity: Warning
annotations:
summary: " [{{ $labels.job }}] "
description: "宿主机内存使用率超过80%."
value: "{{ $value }}"
- alert: CPU使用
expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[3m]))) by (instance,appname) > 0.65
for: 30s
labels:
name: CPU
severity: Warning
annotations:
summary: " [{{ $labels.job }}] "
description: "宿主机CPU使用率超过65%."
value: "{{ $value }}"
- alert: 主机负载
expr: node_load5 > 4
for: 30s
labels:
name: Load
severity: Warning
annotations:
summary: " [{{ $labels.job }}] "
description: " 主机负载5分钟超过4."
value: "{{ $value }}"
- alert: 文件系统
expr: 1-(node_filesystem_free_bytes / node_filesystem_size_bytes) > 0.8
for: 30s
labels:
name: Disk
severity: Warning
annotations:
summary: " [{{ $labels.job }}] "
description: " 宿主机 [ {{ $labels.mountpoint }} ]分区使用超过80%."
value: "{{ $value }}%"
- alert: 磁盘io
expr: irate(node_disk_writes_completed_total{job=~"Host"}[1m]) > 10
for: 30s
labels:
name: Diskio
severity: Warning
annotations:
summary: " [{{ $labels.job }}] "
description: " 宿主机 [{{ $labels.device }}]磁盘1分钟平均写入IO负载较高."
value: "{{ $value }}iops"
- alert: 流量接收
expr: irate(node_network_receive_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[3m]) / 1048576 > 3
for: 30s
labels:
name: Network_receive
severity: Warning
annotations:
summary: " [{{ $labels.job }}] "
description: " 宿主机 [{{ $labels.device }}] 网卡5分钟平均接收流量超过3Mbps."
value: "{{ $value }}3Mbps"
- alert: 流量发送
expr: irate(node_network_transmit_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[3m]) / 1048576 > 3
for: 30s
labels:
name: Network_transmit
severity: Warning
annotations:
summary: " [{{ $labels.job }}] "
description: " 宿主机 [{{ $labels.device }}] 网卡5分钟内平均发送流量超过3Mbps."
value: "{{ $value }}3Mbps"
- name: Container
rules:
- alert: ContainerCPU Usage
expr: (sum by(name,instance) (rate(container_cpu_usage_seconds_total{image!=""}[3m]))*100) > 60
for: 30s
labels:
name: CPU
severity: Warning
annotations:
summary: "{{ $labels.name }} "
description: " 容器CPU使用超过60%."
value: "{{ $value }}%"
- alert: ContainerMem Usage
# expr: (container_memory_usage_bytes - container_memory_cache) / container_spec_memory_limit_bytes * 100 > 10
expr: container_memory_usage_bytes{name=~".+"} / 1048576 > 1024
for: 30s
labels:
name: Memory
severity: Warning
annotations:
summary: "{{ $labels.name }} "
description: " 容器内存使用超过1GB."
value: "{{ $value }}G"vi /data/site/docker/env/monitor/prometheus/files/server.json
[
{
"targets": ["file.htmltoo.com:9100","https://abc.htmltoo.com/:9999:9100"]
}
]vi /data/site/docker/env/monitor/prometheus/files/db.json
[
{
"targets": ["mysqld-exporter:9104","redis-exporter:9121"]
}
]vi /data/site/docker/env/monitor/prometheus/files/check.json
[
{
"targets": ["39.101.166.123:3333","39.101.166.123:3306"]
}
]4.安装运行Grafana
docker run -d \ -p 3000:3000 \ -e "GF_SECURITY_ADMIN_PASSWORD=admin" \ -v ~/grafana_db:/var/lib/grafana grafana/grafana
我们可通过http://monitor_host:3000访问Grafana网页界面(缺省的帐号/密码为admin/admin)
https://www.jianshu.com/p/085edb535070
https://www.jianshu.com/p/dfd6ba5206dc
https://www.jianshu.com/p/ac4098d1264a