Professional Documents
Culture Documents
Pgpool Ll+Pg流复制 Ha配置
Pgpool Ll+Pg流复制 Ha配置
能。本案例是在配置好流复制的基础上进行的,详细描述了配置过程,主备切换,性能测
试,主备维护等。
拓扑结构:
一 主机规划
角色 主机名 IP 地址 端口
Master Pg1 192.168.18.211 1922 数据库信息
backend:0 Pg1 192.168.18.211 9999 Pgpool 信息
Slave Pg2 192.168.18.212 1922
backend:1 Pg2 192.168.18.212 9999
Vip(虚拟 IP) 192.168.18.215 对外提供服务
二 配置主机信任关系
验证信任关系配置是否成功,注意远程和本机都要以远程方式验证,如果不需要密码,
说明配置成功。
#pg1 主机
ssh postgres@pg2 uptime
ssh postgres@pg1 uptime #第一次需要密码
#pg2 主机
ssh postgres@pg2 uptime #第一次需要密码
ssh postgres@pg1 uptime
三 安装 pgpool
3.1、安装 pgpool
mkdir /usr/local/pgpool (root)
chown postgres:postgres /usr/local/pgpool (root)
cd /soft/pgpool-II-3.7.13
./configure --prefix=/usr/local/pgpool --with-pgsql=/usr/local/pg12.2/
make
make install
3.2、安装 pgpool 相关函数,可选,建议安装
cd /soft/pgpool-II-3.7.13/src/sql
make
make install
cd sql
psql -f insert_lock.sql
3.3、配置 postgres 用户环境变量(pg1,pg2)
vi .bash_profile
export PGPOOL_HOME=/usr/local/pgpool
export PATH=$PATH:$PGPOOL_HOME/bin
四、配置 pgpool
4.1、配置 pg1 主机上的 pool_hba.conf
pool_hba.conf 是对登录用户进行验证的,要和 pg1 的 pg_hba.conf 保持一致。
cd /usr/local/pgpool/etc/
cp pool_hba.conf.sample pool_hba.conf
vi pool_hba.conf –添加如下内容
host replication repl pg2 trust
host replication repl 192.168.18.0/24 trust
host all all 192.168.18.0/24 trust
4.2、配置 pg2 主机上的 pool_hba.conf,添加如下内容:
host replication repl pg1 trust
host replication repl 192.168.18.0/24 trust
host all all 192.168.18.0/24 trust
4.3、配置 pcp.conf(pg1,pg2)
pcp.conf 配置用于 pgpool 自己登陆管理使用的,一些操作 pgpool 的工具会要求提供密
码等,比如节点的添加和删除等,配置如下:
cd /usr/local/pgpool/etc
cp pcp.conf.sample pcp.conf
# 使用 pg_md5 生成配置的用户名密码
pg_md5 postgres
e8a48653851e28c69d0506508fb27fc5
# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd' #此处的 poo_passwd 就是上面产生的文件
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
pid_file_name = '/usr/local/pgpool/pgpool.pid'
#------------------------------------------------------------------------------
# LOAD BALANCING MODE 启用负载均衡
#------------------------------------------------------------------------------
load_balance_mode = on
#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------
master_slave_mode = on
master_slave_sub_mode = 'stream'
# - Streaming – 流复制检查
sr_check_period = 10
sr_check_user = 'repl' #postgres replication 用户的名字
sr_check_period = 0
sr_check_password = 'oracle' #postgres replication 用户的密码
sr_check_database = 'postgres'
#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS 各个节点之间健康检查
#------------------------------------------------------------------------------
health_check_period = 10 #默认为 0,则不检查
health_check_timeout = 20
health_check_user = 'postgres' #pg 数据库用户的名字,要求要有 supper 权限
health_check_password = 'postgres'
health_check_database = 'postgres'
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = on
# - Lifecheck Setting –
wd_monitoring_interfaces_list = 'eth4'
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'pg2' #备机的名字
heartbeat_destination_port0 = 9694 #进行测试存活状态的端口
heartbeat_device0 = 'eth1' #pg2 网络心跳的网口,一般选择专门的网口
other_pgpool_hostname0 = 'pg2'
other_pgpool_port0 = 9999
other_wd_port0 = 9000
# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd' #此处的 poo_passwd 就是上面产生的文件
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
pid_file_name = '/usr/local/pgpool/pgpool.pid'
#------------------------------------------------------------------------------
# LOAD BALANCING MODE 启用负载均衡
#------------------------------------------------------------------------------
load_balance_mode = on
#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------
master_slave_mode = on
master_slave_sub_mode = 'stream'
# - Streaming – 流复制检查
sr_check_period = 10
sr_check_user = 'repl' #postgres replication 用户的名字
sr_check_period = 0
sr_check_password = 'oracle' #postgres replication 用户的密码
sr_check_database = 'postgres'
#------------------------------------------------------------------------------
# HEALTH CHECK GLOBAL PARAMETERS 各个节点之间健康检查
#------------------------------------------------------------------------------
health_check_period = 10 #默认为 0,则不检查
health_check_timeout = 20
health_check_user = 'postgres' #pg 数据库用户的名字,要求要有 supper 权限
health_check_password = 'postgres'
health_check_database = 'postgres'
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------
# - Enabling -
use_watchdog = on
wd_hostname = 'pg2'
wd_port = 9000 #看门狗进行通信的端口
# - Lifecheck Setting –
wd_monitoring_interfaces_list = 'eth4'
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'pg1' #主机的名字
heartbeat_destination_port0 = 9694 #进行测试存活状态的端口
heartbeat_device0 = 'eth4' #pg1 网络心跳的网口,一般选择专门的网口
other_pgpool_hostname0 = 'pg1'
other_pgpool_port0 = 9999
other_wd_port0 = 9000
五、其它辅助设置
5.1、切换脚本编辑(pg1,pg2)
vi /usr/local/pgpool/failover_stream.sh
#! /bin/sh
# Failover command for streaming replication.
# Arguments: $1: new master hostname.
new_master=$1
trigger_command="$PG_HOME/bin/pg_ctl promote -D $PGDATA"
exit 0;
授予可执行权限:
Chmod +x /usr/local/pgpool/failover_stream.sh
六 启动集群守护进程
6.4、查看后台运行日志
tail -f /usr/local/pgpool/pgpool.log
2020-03-10 21:45:21: pid 3297: LOG: Backend status file /tmp/pgpool_status discarded
2020-03-10 21:45:21: pid 3297: LOG: waiting for watchdog to initialize
2020-03-10 21:45:21: pid 3298: LOG: setting the local watchdog node name to "pg2:9999 Linux
pg2"
2020-03-10 21:45:21: pid 3298: LOG: watchdog cluster is configured with 1 remote nodes
2020-03-10 21:45:21: pid 3298: LOG: watchdog remote node:0 on pg1:9000
2020-03-10 21:45:21: pid 3298: LOG: watchdog node state changed from [DEAD] to [LOADING]
2020-03-10 21:45:21: pid 3298: LOG: new outbound connection to pg1:9000
2020-03-10 21:45:21: pid 3298: LOG: new watchdog node connection is received from
"192.168.18.211:1250"
2020-03-10 21:45:21: pid 3298: LOG: new node joined the cluster hostname:"pg1" port:9000
pgpool_port:9999
2020-03-10 21:45:21: pid 3298: DETAIL: Pgpool-II version:"3.7.13" watchdog messaging version:
1.1
2020-03-10 21:45:26: pid 3298: LOG: watchdog node state changed from [LOADING] to
[JOINING]
2020-03-10 21:45:26: pid 3298: LOG: setting the remote node "pg1:9999 Linux pg1" as
watchdog cluster master
七、日常维护
7.1、启动顺序
先启动主库的 pg 数据库,然后启动主库的 pgpool 守护进程,这样子 vip 会在主库上生
成,否则会在备库产生,但是不影响业务的访问,从这可以看出 vip 是可以在不同的集群上
漂移,跟以往的双机热备有所区别,以前 vip 总是在主库上产生。
接着启动备库 pg 数据库,然后启动 pgpool 进程。
7.2、故障切换
7.2.1、模拟主库 pgpool 进程中断:
pgpool -m smart stop
replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------
------
同时业务不受影响,连接正常。
psql -h vip -p 9999 -U john -d testdb
7.2.2、模拟主库数据库关闭:
pg_ctl -m fast stop
此时备库的 pgpool 切换脚本会把备库切换成主库,提供主库服务。
检查 postgresql.auto.conf 文件,发现里面的 primary_conninfo 内容还在,需要把该行注
释掉,否则不会承担主库的责任,向备库发送日志。
重启数据库。
7.2.3、把原来的主库转换成备库
创建 standby.signal 文件:
primary_conninfo = 'host=pg1 application_name=standby_pg2 port=1922 user=repl
password=oracle options=''-c wal_sender_timeout=5000'''
restore_command = 'cp /home/postgres/arch/%f %p'
archive_cleanup_command = 'pg_archivecleanup /home/postgres/arch %r'
standby_mode = on
修改 postgresql.auto.conf 文件:
primary_conninfo = 'user=repl passfile=''/home/postgres/.pgpass'' host=pg1
application_name=standby_pg2 port=1922 sslmode=disable sslcompressio
n=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
此时日志文件报错,主备库的时间线不一致:
new timeline 8 forked off current database system timeline 7 before current recovery point
1/420000A0
使用 pg_rewind 进行恢复:
pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.18.211 port=1922
user=postgres dbname=testdb'
pg_rewind: servers diverged at WAL location 1/410000A0 on timeline 7
pg_rewind: rewinding from last common checkpoint at 1/41000028 on timeline 7
pg_rewind: Done!
7.2.4、启动备库
注意 rewind 以后会自动把 standby.signal 文件删除,以及修改 postgres.auto.conf 文件,
需要手动修改。
实验过程中会报错,把主备库多启动几次后问题解决,可能是需要不断的调整。
后来又做了一次切换,发现没有碰到上面那么多问题,切换比较顺利。
总结:
Pgpool 发生问题,只是到账 vip 发生漂移,不会影响主备库的状态,也不会影响业务。
主库数据库如果关闭或者异常中断,那么 pgpool 就会备库切换成主库,提供服务,不
影响业务,但是恢复主备库之间的关系,需要人为的干预,不够智能。
所以以后如果要维护数据库,想不发生主备库切换,需要先停止 pgpool 进程,然后再
关闭主库,否则会发生切换,还得修复备库,增加工作量。
八、压力测试
8.1、通过第三台机器网络连接测试:
more yalitest.sh
#!/bin/bash
BEGIN_TIME=`date`
for i in {1..10000}
do
#echo $i
psql -p 9999 -U postgres -h vip -d postgres -c "SELECT * FROM pgbench_accounts WHERE aid = $i"
done
END_TIME=`date`
---------+----------+------+--------+-----------+---------+------------+-------------------+------------------
8.2、pgbench 工具测试
--初始化数据:
pgbench -i testdb
pgbench -c 30 -T 20 -h vip -p 9999 -U postgres -r testdb
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 30
number of threads: 1
duration: 60 s
number of transactions actually processed: 22493
latency average = 80.087 ms
tps = 374.591718 (including connections establishing)
tps = 374.608719 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
0.001 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
2.166 BEGIN;
8.840 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
2.260 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
3.446 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
9.520 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
2.383 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
51.283 END;
该测试用例只会在主库上操作,因为是 dml 操作,所以只能连接到主库。
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+------------------
注意 replication_delay 有了值,经过观察是备库做同步的时候延迟行数,只有在测试过程中才会出现,等同步完成就
会归零。
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 30
number of threads: 1
duration: 60 s
number of transactions actually processed: 188317
latency average = 9.565 ms
tps = 3136.558834 (including connections establishing)
tps = 3136.724522 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
9.524 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
duration: 60 s
number of transactions actually processed: 169220
latency average = 10.653 ms
tps = 2816.036003 (including connections establishing)
tps = 2816.167417 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
10.616 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
duration: 60 s
number of transactions actually processed: 192090
latency average = 9.374 ms
tps = 3200.446677 (including connections establishing)
tps = 3200.575641 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
9.337 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
对比没有发现负载均衡起到很大的作用。
8.5、增加并发用户,提高压力测试
pgbench -c 60 -T 60 -S -h vip -p 9999 -U postgres -r postgres
duration: 80 s
number of transactions actually processed: 266983
latency average = 18.006 ms
tps = 3332.161044 (including connections establishing)
tps = 3332.301769 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
17.905 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
duration: 80 s
number of transactions actually processed: 271234
latency average = 17.714 ms
tps = 3387.202774 (including connections establishing)
tps = 3387.367938 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
17.604 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
加入节点:
pcp_attach_node -h vip -p 9898 -U postgres -n 1
8.6、连接单机主库进行测试:
pgbench -c 30 -T 60 -S -h pg2 -p 1922 -U postgres -r postgres
第一遍:
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 30
number of threads: 1
duration: 60 s
number of transactions actually processed: 78748
latency average = 22.880 ms
tps = 1311.215262 (including connections establishing)
tps = 1311.247768 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
22.848 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
第二遍:
duration: 60 s
number of transactions actually processed: 80374
latency average = 22.413 ms
tps = 1338.480191 (including connections establishing)
tps = 1338.512735 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
22.385 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
第三遍:
duration: 60 s
number of transactions actually processed: 78871
latency average = 22.840 ms
tps = 1313.458523 (including connections establishing)
tps = 1313.492331 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
22.811 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
---------+----------+------+--------+-----------+---------+------------+-------------------+------------------
Select_cnt 统计不断发生变化,说明两个节点都被分配了查询任务,实现负载均衡。
第一遍:
pgbench -c 30 -T 60 -S -h vip -p 9999 -U postgres -r postgres
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 30
number of threads: 1
duration: 60 s
number of transactions actually processed: 136558
latency average = 13.195 ms
tps = 2273.668774 (including connections establishing)
tps = 2273.778799 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
13.155 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
第二遍:
duration: 60 s
number of transactions actually processed: 130250
latency average = 13.829 ms
tps = 2169.355643 (including connections establishing)
tps = 2169.458627 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
13.791 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
第三遍:
duration: 60 s
number of transactions actually processed: 156074
latency average = 11.540 ms
tps = 2599.621368 (including connections establishing)
tps = 2599.749783 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
11.508 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
第四遍:
duration: 60 s
number of transactions actually processed: 155708
latency average = 11.571 ms
tps = 2592.586873 (including connections establishing)
tps = 2592.706989 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
11.539 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
通过分析,发现通过网络连接到单机进行查询时所花平均时间为 22 秒;但是通过网络
连接到 vip(集群)进行查询时,所花平均时间为 12 秒,可以看出负载均衡还是起到了作用。
单机的 tps 平均为 1313,集群的 tps 为 2592。
如果在本机进行压力测试,结果是比集群的要快很多,可能在网络上会有很多的消耗。
pgbench -c 30 -T 60 -S -h pg2 -p 1922 -U postgres
latency average = 2.614 ms
tps = 11474.575733 (including connections establishing)
tps = 11474.786866 (excluding connections establishing)
8.8、pgpool.conf 中与连接数量相关的参数:
#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------
一个节点:
duration: 60 s
number of transactions actually processed: 101261
latency average = 59.440 ms
tps = 1682.356283 (including connections establishing)
tps = 1682.430350 (excluding connections establishing)
statement latencies in milliseconds:
0.003 \set aid random(1, 100000 * :scale)
58.966 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
duration: 60 s
number of transactions actually processed: 100418
latency average = 59.951 ms
tps = 1668.027917 (including connections establishing)
tps = 1668.093257 (excluding connections establishing)
statement latencies in milliseconds:
0.003 \set aid random(1, 100000 * :scale)
59.473 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
duration: 60 s
number of transactions actually processed: 132744
latency average = 45.252 ms
tps = 2209.859230 (including connections establishing)
tps = 2209.937983 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
45.009 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
duration: 60 s
number of transactions actually processed: 76405
latency average = 78.842 ms
tps = 1268.357245 (including connections establishing)
tps = 1268.400040 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid random(1, 100000 * :scale)
78.098 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
duration: 60 s
number of transactions actually processed: 112301
latency average = 53.503 ms
tps = 1869.039670 (including connections establishing)
tps = 1869.105436 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
53.226 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;