Professional Documents
Culture Documents
Nightingale 单机部署
Nightingale 单机部署
服务端部署
一、环境准备
1.1 服务器规划
主机:10.0.0.100 master
主机:10.0.0.111 slave
1.2 配置时间(必须确认服务器时间是否于当前时间一
致)
#查看当前时间是否于实际一致
[root@10.0.0.100 /opt/n9e]# date
Tue Mar 1 15:40:14 CST 2022
#如果不一致,配置时间
yum -y install ntp ntpdate
ntpdate 0.asia.pool.ntp.org
#将系统时间写入硬件时间
hwclock --systohc
1.3 安装服务
1.3.1 安装普罗米修斯
mkdir -p /opt/prometheus
wget https://s3-gz01.didistatic.com/n9e-pub/prome/prometheus-2.28.0.linux-
amd64.tar.gz -O prometheus-2.28.0.linux-amd64.tar.gz
tar xf prometheus-2.28.0.linux-amd64.tar.gz
cp -far prometheus-2.28.0.linux-amd64/* /opt/prometheus/
mkdir -p /opt/prometheus
wget https://s3-gz01.didistatic.com/n9e-pub/prome/prometheus-2.28.0.linux-
amd64.tar.gz -O prometheus-2.28.0.linux-amd64.tar.gz
tar xf prometheus-2.28.0.linux-amd64.tar.gz
cp -far prometheus-2.28.0.linux-amd64/* /opt/prometheus/
1.3.2 编写守护进程
cat <<EOF >/etc/systemd/system/prometheus.service
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/opt/prometheus/prometheus --
config.file=/opt/prometheus/prometheus.yml --
storage.tsdb.path=/opt/prometheus/data --web.enable-lifecycle --enable-
feature=remote-write-receiver --query.lookback-delta=2m
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF
1.3.3 启动服务并检查
systemctl daemon-reload
systemctl enable prometheus
systemctl restart prometheus
systemctl status prometheu
1.3.4 安装MySQL
yum -y install mariadb*
systemctl enable mariadb
systemctl restart mariadb
mysql -e "SET PASSWORD FOR 'root'@'localhost' = PASSWORD('1234');" ###
设置密码为1234,可更改
1.3.5 安装Redis
yum install -y redis
systemctl enable redis
systemctl restart redis
上例中mysql的root密码设置为了1234,建议维持这个不变,后续就省去了修改配置文件的麻
烦
二、安装夜莺组件
mkdir -p /opt/n9e && cd /opt/n9e
# 去 https://github.com/didi/nightingale/releases 找最新版本的包,文档里的包地址
可能已经不是最新的了
tarball=n9e-5.0.0-ga-06.tar.gz
urlpath=https://github.com/didi/nightingale/releases/download/v5.0.0-ga-
06/${tarball}
wget $urlpath || exit 1
2.1 编写守护进程文件
cat <<EOF >/etc/systemd/system/n9e.service
[Unit]
Description=Nightingale
After=syslog.target network.target
[Service]
Type=forking
ExecStart=/bin/bash /opt/n9e/n9e.sh
ExecStop=/bin/pkill n9e
ExecReload=/bin/kill -USR2 n9e
PrivateTmp=true
[Install]
WantedBy=multi-user.target
EOF
2.2 启动脚本文件
cat <<EOF >/opt/n9e/n9e.sh
#!/bin/bash
cd /opt/n9e/
./n9e server &> server.log &
./n9e webapi &> webapi.log &
EOF
2.3 检查log日志及端口
cd /opt/n9e/
cat server.log
无报错信息,输出的都是info。
cat webapi.log
这一步主要是看服务是否都正常启动了。如果启动成功,server默认会监听在19000端口,
webapi会监听在18000端口,且日志没有报错。
浏览器访问webapi的端口(默认是18000)就可以体验相关功能了,默认用户是root,密码
是root.2020
3、使用TELEGRAF采集监控数据
Telegraf 是 InfluxData 开源的一款采集器,可以采集操作系统、各种中间件的监控指标,采
集目标列表,看起来是非常丰富,Telegraf是一个大一统的设计,即一个二进制可以采集
CPU、内存、mysql、mongodb、redis、snmp等,不像Prometheus的exporter,每个监控对
象一个exporter,管理起来略麻烦。一个二进制分发起来确实比较方便。
3.1 安装脚本(随便找个位置编写一个.sh的脚本文件)
#!/bin/sh
version=1.20.4
tarball=telegraf-${version}_linux_amd64.tar.gz
wget https://dl.influxdata.com/telegraf/releases/$tarball
tar xzvf $tarball
mkdir -p /opt/telegraf
cp -far telegraf-${version}/usr/bin/telegraf /opt/telegraf
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.opentsdb]]
host = "http://127.0.0.1"
port = 19000
http_batch_size = 50
http_path = "/opentsdb/put"
debug = false
separator = "_"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = true
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs",
"squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.system]]
fielddrop = ["uptime_format"]
[[inputs.net]]
ignore_protocol_stats = true
EOF
[Service]
Type=simple
[Install]
WantedBy=multi-user.target
EOF
3.2 查看是否启动
systemctl daemon-reload
systemctl enable telegraf
systemctl restart telegraf
systemctl status telegraf
/opt/telegraf/telegraf.conf的内容是个删减版,只是为了让程序快速跑起来,如果
要采集更多监控对象,比如mysql、redis、tomcat等,还是要仔细去阅读从tarball里解压出来
的那个配置文件,那里有很详细的注释,也可以参考官方提供的各个采集插件下的README
4、客户端部署(10.0.0.111)
客户端只需要安装telegraf就可以了
4.1 安装脚本(随便找个位置编写一个.sh的脚本文件)
#!/bin/sh
version=1.20.4
tarball=telegraf-${version}_linux_amd64.tar.gz
wget https://dl.influxdata.com/telegraf/releases/$tarball
tar xzvf $tarball
mkdir -p /opt/telegraf
cp -far telegraf-${version}/usr/bin/telegraf /opt/telegraf
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.opentsdb]]
host = "http://10.0.0.110" #########注意这个地方需要改成服务端的IP
port = 19000
http_batch_size = 50
http_path = "/opentsdb/put"
debug = false
separator = "_"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = true
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs",
"squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.system]]
fielddrop = ["uptime_format"]
[[inputs.net]]
ignore_protocol_stats = true
EOF
[Service]
Type=simple
ExecStart=/opt/telegraf/telegraf --config telegraf.conf
WorkingDirectory=/opt/telegraf
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=telegraf
KillMode=process
KillSignal=SIGQUIT
TimeoutStopSec=5
Restart=always
[Install]
WantedBy=multi-user.target
EOF
4.2 查看是否启动
systemctl daemon-reload
systemctl enable telegraf
systemctl restart telegraf
systemctl status telegraf