CentOS安装验证PaceMaker
# yum install pcs pacemaker fence-agents-all
安装后检查
[root@server3 ~]# rpm -q pacemaker pacemaker-1.1.20-5.el7_7.1.x86_64 [root@server3 ~]# grep hacluster /etc/passwd hacluster:x:189:189:cluster user:/home/hacluster:/sbin/nologin
建立双机信任关系 不是必须的
增加节点间的认证-在其中一台执行
[root@server3 ~]# pcs cluster auth server3.example.com server4.example.com Username: root Password: Error: s3: Username and/or password is incorrect Error: Unable to communicate with s4 [root@server3 ~]# [root@server3 ~]# pcs cluster auth server3.example.com server4.example.com Username: hacluster Password: Error: Unable to communicate with server4.example.com server3.example.com: Authorized [root@server3 ~]#
[root@server4 ~]# pcs cluster auth server3.example.com server4.example.com Username: hacluster Password: server4.example.com: Authorized server3.example.com: Authorized
添加成功后,根据官网,由于之前手工增加了hacluster用户的登录权限,手工删除
# usermod -s /sbin/nologin hacluster
配置双机
[root@server3 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: server3.example.com (version 1.1.20-5.el7_7.1-3c4c782f70) - partition with quorum Last updated: Wed Sep 25 23:30:26 2019 Last change: Wed Sep 25 23:28:19 2019 by hacluster via crmd on server3.example.com 2 nodes configured 0 resources configured PCSD Status: server3.example.com: Online server4.example.com: Online [root@server3 ~]#
[root@server3 ~]# pcs stonith list|grep -i virt fence_virt - Fence agent for virtual machines fence_xvm - Fence agent for virtual machines
[root@server4 ~]# pvcreate /dev/vdb WARNING: ext3 signature detected on /dev/vdb at offset 1080. Wipe it? [y/n]: y Wiping ext3 signature on /dev/vdb. Physical volume "/dev/vdb" successfully created. [root@server4 ~]#
重启s3还是看不到vdb,把2台机器的shareable勾选上,都关机后再重启。
[root@server3 ~]# vgcreate my_vg /dev/vdb Volume group "my_vg" successfully created [root@server3 ~]# lvcreate -L 200 my_vg -n my_lv Volume group "my_vg" has insufficient free space (49 extents): 50 required. [root@server3 ~]# lvs 不足200M创建不成功,lvs无输出 [root@server3 ~]# lvcreate -L 190 my_vg -n my_lv Rounding up size to full physical extent 192.00 MiB Logical volume "my_lv" created. [root@server3 ~]# mkfs.ext4 /dev/my_vg/my_lv
此时在4机只能看到vdb和pv,使用vgscan之后还是看不到pv
在其中一台执行
# mount /dev/my_vg/my_lv /var/www/
# mkdir /var/www/html
# mkdir /var/www/cgi-bin
# mkdir /var/www/error
# restorecon -R /var/www
# cat <<-END >/var/www/html/index.html
Hello
END
# umount /var/www
测试了去掉END前的减号效果一样的。
独占卷组激活
# lvmconf --enable-halvm --services --startstopservices
重建initram并重启(跳过了),以防止其访问卷组
# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
创建资源组
[root@server3 ~]# pcs resource create my_lvm LVM volgrpname=my_vg \ exclusive=true --group apachegroup Assumed agent name 'ocf:heartbeat:LVM' (deduced from 'LVM')
通过pcs status可以看到资源启动失败
Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM): FAILED (Monitoring)[ server3.example.com server4.example.com ] Failed Resource Actions: * my_lvm_monitor_0 on server3.example.com 'unknown error' (1): call=5, status=complete, exitreason='The volume_list filter must be initialized in lvm.conf for exclusive activation without clvmd'
Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM): Stopped
[root@server3 ~]# cp -p /boot/initramfs-3.10.0-1062.1.1.el7.x86_64.img /boot/initramfs-3.10.0-1062.1.1.el7.x86_64.img.bakok1 [root@server4 ~]# cp -p /boot/initramfs-3.10.0-957.el7.x86_64.img /boot/initramfs-3.10.0-957.el7.x86_64.img.bakok1 [root@server3 ~]# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r) [root@server3 ~]# reboot text 还是无法启动 删除重新创建,甚至修改为不独占,都无法启动 text pcs resource delete my_lvm pcs resource create my_lvm LVM volgrpname=my_vg \ > exclusive=false --group apachegroup
看日志
[root@server3 ~]# journalctl -xe Sep 26 17:13:54 server3.example.com pengine[3312]: error: Resource start-up disabled since no STONITH resources have been defined Sep 26 17:13:54 server3.example.com pengine[3312]: error: Either configure some or disable STONITH with the stonith-enabled option Sep 26 17:13:54 server3.example.com pengine[3312]: error: NOTE: Clusters with shared data need STONITH to ensure data integrity Sep 26 17:13:54 server3.example.com pengine[3312]: notice: Removing my_lvm from server3.example.com Sep 26 17:13:54 server3.example.com pengine[3312]: notice: Removing my_lvm from server4.example.com [root@server3 ~]# pcs property show --all stonith-enabled: true [root@server3 ~]# pcs property set stonith-enabled=false 只在1台修改 [root@server3 ~]# pcs property show --all |grep stonith-enabled stonith-enabled: false
重新添加资源,启动成功
[root@server3 ~]# pcs resource create my_lvm LVM volgrpname=my_vg \ > exclusive=true --group apachegroup Assumed agent name 'ocf:heartbeat:LVM' (deduced from 'LVM') [root@server3 ~]# pcs status [root@server3 ~]# pcs resource Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM): Started server3.example.com 使用 lvdisplay 显示 LV Status available
继续添加其他资源
验证
状态检查
[root@server3 ~]# pcs status Cluster name: mytest_cluster Stack: corosync Current DC: server3.example.com (version 1.1.20-5.el7_7.1-3c4c782f70) - partition with quorum Last updated: Thu Sep 26 17:27:24 2019 Last change: Thu Sep 26 17:26:27 2019 by root via cibadmin on server3.example.com 2 nodes configured 4 resources configured Online: [ server3.example.com server4.example.com ] Full list of resources: Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM): Started server3.example.com my_fs (ocf::heartbeat:Filesystem): Started server3.example.com VirtualIP (ocf::heartbeat:IPaddr2): Started server3.example.com Website (ocf::heartbeat:apache): Started server3.example.com Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@server3 ~]#
Hello 此时2台机器systemctl status httpd的状态都未启动 Active: inactive (dead)
[root@server4 ~]# ps -ef|grep httpd root 7389 1 0 17:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 7391 7389 0 17:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 7392 7389 0 17:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 7393 7389 0 17:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 7394 7389 0 17:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 7395 7389 0 17:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid root 7642 3468 0 17:36 pts/0 00:00:00 grep --color=auto httpd [root@server4 ~]# kill -9 7389
程序会自动重启,停止了4次都没有切换,只是有一个告警
Failed Resource Actions: * Website_monitor_10000 on server4.example.com 'not running' (7): call=42, status=complete, exitreason='', * last-rc-change='Thu Sep 26 17:37:11 2019', queued=0ms, exec=0ms
# mv /sbin/httpd /sbin/httpdbak
之后再kill进程,立即就切换了,估计1秒。
看日志,程序比较智能,实际并未重启次
Sep 26 17:54:08 server3.example.com apache(Website)[9380]: ERROR: apache httpd program not found Sep 26 17:54:08 server3.example.com apache(Website)[9396]: ERROR: environment is invalid, resource considered stopped Sep 26 17:54:08 server3.example.com lrmd[3325]: notice: Website_monitor_10000:9320:stderr [ ocf-exit-reason:apache httpd program not found ] Sep 26 17:54:08 server3.example.com lrmd[3325]: notice: Website_monitor_10000:9320:stderr [ ocf-exit-reason:environment is invalid, resource considered stopped ] Sep 26 17:54:08 server3.example.com crmd[3332]: notice: server3.example.com-Website_monitor_10000:41 [ ocf-exit-reason:apache httpd program not found\nocf-exit-reason:environment is invalid, resource considered stopped\n ] ... Sep 26 17:54:09 server3.example.com pengine[3331]: warning: Processing failed start of Website on server3.example.com: not installed Sep 26 17:54:09 server3.example.com pengine[3331]: notice: Preventing Website from re-starting on server3.example.com: operation start failed 'not installed' (5) Sep 26 17:54:09 server3.example.com pengine[3331]: warning: Forcing Website away from server3.example.com after failures (max=)
2 nodes configured 4 resources configured Online: [ server3.example.com ] OFFLINE: [ server4.example.com ] Full list of resources: Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM): Started server3.example.com my_fs (ocf::heartbeat:Filesystem): Started server3.example.com VirtualIP (ocf::heartbeat:IPaddr2): Started server3.example.com Website (ocf::heartbeat:apache): Stopped Failed Resource Actions: * Website_start_0 on server3.example.com 'not installed' (5): call=44, status=complete, exitreason='environment is invalid, resource considered stopped',
- Website_start_0 on server3.example.com ‘unknown error’ (1): call=63, status=Timed Out, exitreason=’’,
reboot重启后,所有资源都为stopped
journalctl检查日志,不是很有用,没有仔细去找到关键日志。
Sep 26 18:13:53 server3.example.com LVM(my_lvm)[3460]: WARNING: LVM Volume my_vg is not available (stopped) Sep 26 18:13:53 server3.example.com crmd[3321]: notice: Result of probe operation for my_lvm on server3.example.com: 7 (not running) Sep 26 18:13:53 server3.example.com crmd[3321]: notice: Initiating monitor operation my_fs_monitor_0 locally on server3.example.com Sep 26 18:13:53 server3.example.com Filesystem(my_fs)[3480]: WARNING: Couldn't find device [/dev/my_vg/my_lv]. Expected /dev/??? to exist Online: [ server3.example.com ] OFFLINE: [ server4.example.com ] Full list of resources: Resource Group: apachegroup my_lvm (ocf::heartbeat:LVM): Started server3.example.com my_fs (ocf::heartbeat:Filesystem): Started server3.example.com VirtualIP (ocf::heartbeat:IPaddr2): Started server3.example.com Website (ocf::heartbeat:apache): Started server3.example.com [xy@xycto ~]$ curl http://192.168.122.30 Hello 把s4的网卡启动起来 ifconfig eth4 up 会自动去掉s4的浮动地址,并加入双机,2台机器显示pcs status相同。
发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/222853.html原文链接:https://javaforall.net
