https://github.com/prometheus

https://prometheus.io/docs/

https://hub.docker.com/r/prom/prometheus/

https://github.com/starsliao/Prometheus

https://gitee.com/feiyu563/PrometheusAlert

# run

docker run -d  --net=host  --name prometheus  --restart=always  -v /data/site/docker/env/monitor/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro  -v /data/site/docker/env/monitor/prometheus/rules.yml:/etc/prometheus/rules.yml:ro  -v /data/site/docker/env/monitor/prometheus/files:/etc/prometheus/files:ro -v /etc/localtime:/etc/localtime:ro  prom/prometheus:latest


docker run -d  --net=host  --name  node  --restart=always  -v /etc/localtime:/etc/localtime:ro  prom/node-exporter:latest

docker run -d  --net=host  --name  node  --restart=always  -v /etc/localtime:/etc/localtime:ro  hub.htmltoo.com:5000/http:node 


docker run -d  --net=host  --name  blackbox  --restart=always  -v /etc/localtime:/etc/localtime:ro  prom/blackbox-exporter:latest


docker commit -m="update" -a="htmltoo.com" node  hub.htmltoo.com:5000/http:node  

docker push hub.htmltoo.com:5000/http:node 


应用: monitor -> 添加服务: prometheus

镜像: prom/prometheus:latest

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

/data/docker/monitor/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro

/data/docker/monitor/prometheus/rules.yml:/etc/prometheus/rules.yml:ro

/data/docker/monitor/prometheus/files:/etc/prometheus/files:ro

端口:9090-9090


chmod -R 777  /data/docker/monitor/prometheus/


应用: monitor -> 添加服务: mysqld-exporter    https://hub.docker.com/r/prom/mysqld-exporter 

镜像: prom/mysqld-exporter:latest

环境变量:

DATA_SOURCE_NAME = root:wdqdmm@r@(file.htmltoo.com:3306)/

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

端口:9104-9104


应用: monitor -> 添加服务: redis-exporter    https://hub.docker.com/r/oliver006/redis_exporter

镜像: oliver006/redis_exporter:latest

环境变量:

REDIS_ADDR = redis://file.htmltoo.com:6379

REDIS_PASSWORD = wdqdmm@r

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

端口:9121-9121


应用: monitor -> 添加服务: consul-exporter    # https://hub.docker.com/r/prom/consul-exporter

镜像: prom/consul-exporter:latest

变量:

consul.server = consul:8500

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

(端口:9107-9107)


应用: monitor -> 添加服务: consul2-exporter    # https://hub.docker.com/r/prom/consul-exporter

镜像: prom/consul-exporter:latest

变量:

consul.server = consul2:8500

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

端口:9107-9107


应用: monitor -> 添加服务: consul3-exporter    # https://hub.docker.com/r/prom/consul-exporter

镜像: prom/consul-exporter:latest

变量:

consul.server = consul3:8500

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

端口:9107-9107


应用: monitor -> 添加服务: kafka-exporter    https://hub.docker.com/r/danielqsj/kafka-exporter

镜像: danielqsj/kafka-exporter:latest

命令:

--kafka.server=kafka:9092   [--kafka.server=another-server ...]

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

端口:9308-9308


应用: monitor -> 添加服务: greenplum_exporter  # https://hub.docker.com/r/inrgihc/greenplum_exporter

https://github.com/tangyibo/greenplum_exporter

镜像: danielqsj/kafka-exporter:latest

变量:

GPDB_DATA_SOURCE_URL=postgres://gpadmin:password@10.17.20.11:5432/postgres?sslmode=disable

卷:

/etc/localtime:/etc/localtime:ro

/data/file:/data/file

端口:9297-9297


node_exporter - 宿主机添加, 使用端口:9100     https://hub.docker.com/r/prom/node-exporter

https://github.com/prometheus/node_exporter/releases

cd  /data/site/go/htmltoo.ssh/tools/soft/src/common/monitor

wget  https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz

# tar xvf node_exporter-1.0.1.linux-amd64.tar.gz

cd  /opt

wget  https://abc.htmltoo.com/:7777/src/common/monitor/node_exporter-1.0.1.linux-amd64.tar.gz

tar xvf node_exporter-1.0.1.linux-amd64.tar.gz

cd  node_exporter-1.0.1.linux-amd64

mv  node_exporter  ../   &&  cd ../

rm -rf  node_exporter-1.0.1.linux-amd64  node_exporter-1.0.1.linux-amd64.tar.gz

chmod +x /opt/node_exporter

nohup /opt/node_exporter &   #   >test.log 2>&1 & 

---“nohup” 表示程序不被挂起, “test.log”表示输出的日志文件, “2>&1”表示将标准错误输出转变化标准输出

----“>”表示将打印信息重定向到日志文件,   最后一个“&”表示后台运行程序


# nginx-vts-exporter

https://hub.docker.com/r/sophos/nginx-vts-exporter

https://github.com/hnlq715/nginx-vts-exporter/

docker run  -ti --rm --env NGINX_STATUS="http://localhost/status/format/json" sophos/nginx-vts-exporter

https://grafana.com/dashboards/2949

https://hub.docker.com/r/sophos/nginx-prometheus-metrics

docker run -d --rm -it -p 80:80 -p 1314:1314 -p 9527:9527 sophos/nginx-prometheus-metrics
Visit http://localhost:1314 to generate some test metrics.
visit http://localhost:9527/metrics in your browser(safari/chrome).

-将nginx-module-vts监控数据转换成prometheus采集的数据

wget https://github.com/hnlq715/nginx-vts-exporter/archive/refs/tags/v0.10.7.tar.gz

tar -xf v0.10.7.tar.gz  &&  cd nginx-vts-exporter-0.10.7

./nginx-vts-exporter -nginx.scrape_timeout 10 -nginx.scrape_uri http://127.0.0.1:9912/status/format/json

-显示监控数据

http://g.htmltoo.com:9913/metrics

-配置Prometheus

  - job_name: 'nginx-vts-exporter'
    metrics_path: '/metrics'
    static_configs:
    - targets: ['g.htmltoo.com:9913']

-配置Grafana模块

import grafana.com : 2949 load


# haproxy_exporter

https://hub.docker.com/r/prom/haproxy-exporter

https://github.com/prometheus/haproxy_exporter


# node_exporter

https://hub.docker.com/r/prom/node-exporter

https://www.github.com/prometheus/node_exporter


# influxdb-exporter

https://hub.docker.com/r/prom/influxdb-exporter


# clickhouse_exporter

https://github.com/ClickHouse/clickhouse_exporter

https://hub.docker.com/r/flant/clickhouse-exporter

https://www.github.com/flant/clickhouse_exporter


# blackbox-exporter:  监控端口状态

https://hub.docker.com/r/prom/blackbox-exporter

https://github.com/prometheus/blackbox_exporter/releases

---监控端口状态

params:

    module: [tcp_connect]

---监控域名示例

params:

    module: [http_2xx]

---ping检测

params:

     module: [icmp]

---POST测试

params:

    module: [http_post_2xx_query]


# memcached-exporter

https://hub.docker.com/r/prom/memcached-exporter


=========================================================

自动发现机制方便我们在监控系统中动态的添加或者删除资源。比如zabbix可以自动发现监控主机以及监控资源。而prometheus作为一个可以与zabbix旗鼓相当的监控系统,自然也有它的自动发现机制。

file_sd_configs可以用来动态的添加和删除target。

配置

修改prometheus的配置文件

在scrape_configs下面添加如下配置

  - job_name: 'test_server'

    file_sd_configs:

      - files:

        - /app/hananmin/prometheus/file_sd/test_server.json

        refresh_interval: 10s 

files表示文件的路径,文件的内容格式是yaml或者json格式,可以用通配符比如*.json。prometheus或定期扫描这些文件,并加载新配置。refresh_interval定义扫描的时间间隔。

创建被扫描的文件test_server.json

[

  {

    "targets":  ["10.161.4.63:9091","10.161.4.61:9100"]

  }

]

重新加载prometheus的配置

如果间隔时间短的话应该能立刻发现你新加的target。

=========================================================


Prometheus内置了一个web界面,我们可通过http://monitor_host:9090进行访问.


在Status->Targets页面下,我们可以看到我们配置的两个Target,它们的State为UNKNOW.


1.下一步我们需要安装并运行exporter,在被监控端服务器安装Docker。

a.安装运行node_exporter

被监控安装GO环境:

yum install go -y  
apt-get install golang
# 查看go版本: go version

https://github.com/prometheus/node_exporter/releases

cd /data/file/soft/src/monitor
wget  https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz

安装运行node_exporter:

tar xvf node_exporter-0.18.1.linux-amd64.tar.gz
mkdir /usr/local/monitor
mv node_exporter-0.18.1.linux-amd64/node_exporter /usr/local/monitor
chmod +x /usr/local/monitor/node_exporter
nohup /usr/local/monitor/node_exporter &
# docker下安装:
docker run -d \
  -p 9100:9100 \
  --name node-exporter \
  -v "/proc:/host/proc" \
  -v "/sys:/host/sys" \
  -v "/:/rootfs" \
  --net="host" \
  quay.io/prometheus/node-exporter \
    -collector.procfs /host/proc \
    -collector.sysfs /host/sys \
    -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"

b.安装运行mysqld_exporter

https://github.com/prometheus/mysqld_exporter/releases

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gz

安装运行mysqld_exporter:

tar xvf mysqld_exporter-0.11.0.linux-amd64.tar.gz -C /usr/local/
nohup /usr/local/mysqld_exporter-0.11.0.linux-amd64/node_exporter &

mysqld_exporter需要连接到MySQL,所以需要MySQL的权限,我们先为它创建用户并赋予所需的权限。

mysql> CREATE USER 'mysql_monitor'@'localhost' IDENTIFIED BY 'mysql_monitor' WITH MAX_USER_CONNECTIONS 3;
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_monitor'@'localhost';
mysql> FLUSH PRIVILEGES;
# Docker下安装: 传入用户名和密码以及主机IP和端口。
docker run -d \
  -p 9104:9104 \
  -e DATA_SOURCE_NAME="mysql_monitor:mysql_monitor@(10.10.0.186:3306)/" prom/mysqld-exporter:latest
变量:
DATA_SOURCE_NAME = ihunter:m@(209.cndo.org:3306)/

我们再次回到Status->Targets页面,可以看到两个Target的状态已经变成UP了。

接下来就是加入Grafana作为Prometheus的Dashboard,


2.监控之Mysql

Grafana目前官方还没有的配置好的MySQL图表模板,这里使用Percona开源的模板。

https://github.com/percona/grafana-dashboards/releases

下载,导入MySQL_Overview模板, 导入.


3.监控之Redis

https://github.com/oliver006/redis_exporter/releases/

wget https://github.com/oliver006/redis_exporter/releases/download/v0.20.2/redis_exporter-v0.20.2.linux-amd64.tar.gz
tar -xvf redis_exporter-v0.20.2.linux-amd64.tar.gz
#docker下安装:  https://github.com/oliver006/redis_exporter
docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter:latest
变量: 
REDIS_ADDR = 209.cndo.org:6379

下载grafana的redis的prometheus-redis_rev1.json模板:

wget  https://grafana.com/api/dashboards/763/revisions/1/download

在grafana中导入json模板.

# 启动redis_exporter:
## 无密码
./redis_exporter redis//192.168.1.120:6379 &
## 有密码
redis_exporter  -redis.addr 192.168.1.120:6379  -redis.password 123456

prometheus.yml加入redis节点,然后重启prometheus:

 - job_name: redis
    static_configs:
      - targets: ['192.168.1.120:9121']
        labels:
          instance: redis120


4.配置: 

vi  /data/site/docker/env/monitor/prometheus/prometheus.yml

global:
  scrape_interval:     60s
  evaluation_interval: 60s
alerting:       
  alertmanagers:
  - static_configs:
    - targets: [ '127.0.0.1:9093']
rule_files:  #指定报警规则文件
  - "rules.yml"
  
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['127.0.0.1:9090']
        labels:
          instance: prometheus
  - job_name: 'server'
    file_sd_configs:
      - files:
        - /etc/prometheus/files/server.json
        refresh_interval: 10s 
        
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    file_sd_configs:
      - files:
        - /etc/prometheus/files/check.json
        refresh_interval: 10s 
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115  # The blackbox exporter's real hostname:port.        
        
  - job_name: 'db'
    file_sd_configs:
      - files:
        - /etc/prometheus/files/db.json
        refresh_interval: 10s 
#  - job_name: 'consul'
#    file_sd_configs:
#      - files:
#        - /etc/prometheus/files/consul.json
#        refresh_interval: 10s 
#  - job_name: 'cadvisor'
#    file_sd_configs:
#      - files:
#        - /etc/prometheus/files/cadvisor.json
#        refresh_interval: 10s


vi  /data/site/docker/env/monitor/prometheus/rules.yml

groups:

- name: services
  rules:
  - alert: port is down
    expr: probe_success{job=~"check_port_status"} == 0 
    for: 1m
    labels:
     severity: 3 
    annotations:
     summery: "当前值为: {{ $value }}"
     console: '主机 {{ $labels.hostname }}, 端口异常!'

- name: up/down #定义规则组
  rules:
  - alert: 实例停止运行  #定义报警名称
    expr: up == 0   #Promql语句,触发规则
    for: 30s            # 1m-1分钟,1s-1秒   # 告警持续时间,超过这个时间才会发送给alertmanager
    labels:       #标签定义报警的级别和主机
      name: instance
      severity: Critical
    annotations:  #注解
      summary: " [{{ $labels.job }}], 已停止运行! " #报警摘要,取报警信息的appname名称
      description: " [{{ $labels.job }}], 检测到异常停止!超过30分钟, 请重点关注!!! "   #报警信息
      value: "{{ $value }}%"  # 当前报警状态值
- name: Host
  rules:
  - alert: 内存使用
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 >  80
    for: 30s
    labels:
      name: Memory
      severity: Warning
    annotations:
      summary: " [{{ $labels.job }}] "
      description: "宿主机内存使用率超过80%."
      value: "{{ $value }}"
  - alert: CPU使用
    expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[3m]))) by (instance,appname) > 0.65
    for: 30s
    labels:
      name: CPU
      severity: Warning
    annotations:
      summary: " [{{ $labels.job }}] "
      description: "宿主机CPU使用率超过65%."
      value: "{{ $value }}"
  - alert: 主机负载 
    expr: node_load5 > 4
    for: 30s
    labels:
      name: Load
      severity: Warning
    annotations:
      summary: " [{{ $labels.job }}] "
      description: " 主机负载5分钟超过4."
      value: "{{ $value }}"
  - alert: 文件系统
    expr: 1-(node_filesystem_free_bytes / node_filesystem_size_bytes) >  0.8
    for: 30s
    labels:
      name: Disk
      severity: Warning
    annotations:
      summary: " [{{ $labels.job }}] "
      description: " 宿主机 [ {{ $labels.mountpoint }} ]分区使用超过80%."
      value: "{{ $value }}%"
  - alert: 磁盘io
    expr: irate(node_disk_writes_completed_total{job=~"Host"}[1m]) > 10
    for: 30s
    labels:
      name: Diskio
      severity: Warning
    annotations:
      summary: " [{{ $labels.job }}] "
      description: " 宿主机 [{{ $labels.device }}]磁盘1分钟平均写入IO负载较高."
      value: "{{ $value }}iops"
  - alert: 流量接收
    expr: irate(node_network_receive_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[3m]) / 1048576  > 3 
    for: 30s
    labels:
      name: Network_receive
      severity: Warning
    annotations:
      summary: " [{{ $labels.job }}] "
      description: " 宿主机 [{{ $labels.device }}] 网卡5分钟平均接收流量超过3Mbps."
      value: "{{ $value }}3Mbps"
  - alert: 流量发送
    expr: irate(node_network_transmit_bytes_total{device!~"lo|bond[0-9]|cbr[0-9]|veth.*|virbr.*|ovs-system"}[3m]) / 1048576  > 3
    for: 30s
    labels:
      name: Network_transmit
      severity: Warning
    annotations:
      summary: " [{{ $labels.job }}] "
      description: " 宿主机 [{{ $labels.device }}] 网卡5分钟内平均发送流量超过3Mbps."
      value: "{{ $value }}3Mbps"
      
- name: Container
  rules:
  - alert: ContainerCPU Usage
    expr: (sum by(name,instance) (rate(container_cpu_usage_seconds_total{image!=""}[3m]))*100) > 60
    for: 30s
    labels:
      name: CPU
      severity: Warning
    annotations:
      summary: "{{ $labels.name }} "
      description: " 容器CPU使用超过60%."
      value: "{{ $value }}%"
  - alert: ContainerMem Usage
#    expr: (container_memory_usage_bytes - container_memory_cache)  / container_spec_memory_limit_bytes   * 100 > 10  
    expr:  container_memory_usage_bytes{name=~".+"}  / 1048576 > 1024
    for: 30s
    labels:
      name: Memory
      severity: Warning
    annotations:
      summary: "{{ $labels.name }} "
      description: " 容器内存使用超过1GB."
      value: "{{ $value }}G"


vi  /data/site/docker/env/monitor/prometheus/files/server.json

[
  {
    "targets":  ["file.htmltoo.com:9100","https://abc.htmltoo.com/:9999:9100"]
  }
]


vi /data/site/docker/env/monitor/prometheus/files/db.json

[
  {
    "targets":  ["mysqld-exporter:9104","redis-exporter:9121"]
  }
]


vi  /data/site/docker/env/monitor/prometheus/files/check.json

[
  {
    "targets":  ["39.101.166.123:3333","39.101.166.123:3306"]
  }
]


4.安装运行Grafana

docker run -d \
  -p 3000:3000 \
  -e "GF_SECURITY_ADMIN_PASSWORD=admin" \
  -v ~/grafana_db:/var/lib/grafana grafana/grafana

我们可通过http://monitor_host:3000访问Grafana网页界面(缺省的帐号/密码为admin/admin)


https://www.jianshu.com/p/085edb535070

https://www.jianshu.com/p/dfd6ba5206dc

https://www.jianshu.com/p/ac4098d1264a


签名:这个人很懒,什么也没有留下!
最新回复 (0)
返回