Python检查zookeeper节点信息状态
需求
- 目前项目核心模块使用zookeeper做分布式锁,但是zookeeper的节点信息状态没有监控,所以需要监控zookeeper的节点信息状态
- 监控zookeeper的节点信息状态,如果节点信息状态异常不存在,则输出错误关键字或者报警等
实现
- 利用python的zookeeper模块,获取zookeeper的节点信息状态
- 利用python的配合错误关键字,将zookeeper的节点信息异常状态推送网管告警
缺陷
- 使用DataWatch只监控节点信息状态,如果该节点信息不存在,这时其他节点cti1信息变更后,会通知到其他不存在节点cti5,如下信息
#修改节点信息时
[zk: localhost:2182(CONNECTED) 28] set /ctimanager/bj-cti1 {"agent":2,"test":1,"a1":2ok23121111111}
2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:73 INFO get config file zookeeper node list: ['/ctimanager/bj-cti1', '/ctimanager/bj-cti2', '/ctimanager/bj-cti3', '/ctimanager/bj-cti4', '/ctimanager/bj-cti5']
2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:76 INFO zookeeper login host: 127.0.0.1:2182
2023-12-20 19:26:35,660 140538316216128 check_zookeeper_node.py:81 INFO use user login auth conn zookeeper
2023-12-20 19:26:35,661 140538316216128 check_zookeeper_node.py:83 INFO zookeeper user info: digest admin:123456
2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2}' 状态正常
2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
2023-12-20 19:26:35,663 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
2023-12-20 19:26:35,664 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
2023-12-20 19:26:35,665 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
2023-12-20 19:26:35,666 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti5 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti5 正常
2023-12-20 19:26:45,674 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
2023-12-20 19:26:45,676 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
2023-12-20 19:26:45,677 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
实现代码
#!/usr/bin/python3
# -*- coding:utf-8 -*-
###############################################################
#针对平台监控注册zookeeper节点模块宕机后,节点丢失异常时,输出告警##
############2023年12月20日16点02分##############################
###############################################################
from kazoo.client import KazooClient
from kazoo.exceptions import KazooException
from logging.handlers import RotatingFileHandler
import time,configparser,os,logging
def watch_nodes():
for node_path in node_paths:
try:
@zk.DataWatch(node_path)
def on_data_change(data, stat):
if data is None:
logger.error(f"从zookeeper获取节点 {node_path} 不存在")
else:
logger.debug(f"从zookeeper节点 {node_path} 的数据获取信息: {data} 状态正常")
logger.info(f"从zookeeper获取节点 {node_path} 正常")
except KazooException as e:
logger.error(f"从zookeeper监控节点 {node_path} 时发生异常:{e}")
def logger_func(LogLevel):
# 创建日志记录器
logger = logging.getLogger("my_logger")
if "DEBUG" == LogLevel:
logger.setLevel(logging.DEBUG)
if "INFO" == LogLevel:
logger.setLevel(logging.INFO)
if "WARNING" == LogLevel:
logger.setLevel(logging.WARNING)
if "ERROR" == LogLevel:
logger.setLevel(logging.ERROR)
# 创建RotatingFileHandler对象
handler = RotatingFileHandler(log_file, maxBytes=max_log_size, backupCount=backup_count)
# 定义日志格式
formatter = logging.Formatter("%(asctime)s %(thread)d %(filename)s:%(lineno)d %(levelname)s %(message)s")
handler.setFormatter(formatter)
# 将处理程序添加到日志记录器
logger.addHandler(handler)
return logger
if __name__ == "__main__":
for dirpath in os.popen("pwd"):
dirpath = dirpath.strip('\n')
cfgpath = os.path.join(dirpath, "cfg/config.ini")
conf = configparser.ConfigParser()
# conf = ConfigParser.ConfigParser()
print("config file ---> ",cfgpath)
# conf.read(cfgpath, encoding='UTF-8')
conf.read(cfgpath)
GetLogDir = conf.get("Base","LogDir")
LogLevel = conf.get("Base","LogLevel")
CheckintervalDate = conf.get("Base","CheckintervalDate")
max_log_size = int(conf.get("Base","max_log_size"))
backup_count = int(conf.get("Base","backup_count"))
zk_hosts = conf.get("zookeeper","zookeeper_host")
is_auth = conf.getboolean("zookeeper","is_auth")
username = conf.get("zookeeper","zookeeper_user")
password = conf.get("zookeeper","zookeeper_passwd")
zookeeper_node = conf.get("zookeeper","zookeeper_node")
log_file = "./log/watch_zoo.log"
logger = logger_func(LogLevel)
# 要监控的节点路径列表
node_paths = zookeeper_node.split(",")
logger.info("get config file zookeeper node list: " + str(node_paths))
# 连接Zookeeper
logger.info("zookeeper login host: " + zk_hosts)
zk = KazooClient(hosts=zk_hosts)
try:
zk.start()
if True == is_auth:
logger.info("use user login auth conn zookeeper ")
zk.add_auth(scheme='digest',credential=username + ":" + password)
logger.info("zookeeper user info: digest " + str(username) + ":" + password )
else:
logger.info("not use user auth login zookeeper")
while True:
watch_nodes()
time.sleep(int(CheckintervalDate))
except KazooException as e:
if "Connection refused" in str(e):
logger.error("conn zookeeper error: " + str(e) + " exit!!!")
os.exit(110)
else:
logger.error("conn other zookeeper error: " + str(e) + " exit!!!" )
os.exit(110)
finally:
zk.stop()
[Base]
#执行运行间隔休眠时间单位是s
CheckintervalDate=10
#设置日志级别: DEBUG、INFO、WARNING、ERROR
LogLevel=INFO
#日志目录
LogDir=./log
#日志文件大小100MB
max_log_size = 1104857600
#日志文件最大备份次数
backup_count = 10
[zookeeper]
#zookeeper 登录host ip
zookeeper_host = 127.0.0.1:2182
#认证用户
zookeeper_user = admin
#认证密码
zookeeper_passwd = 123456
#是否启用zookeeper密码链接
is_auth = True
#配置对应节点信息,最好对应上,cti上报多个cti,这里就配置多少个节点信息
zookeeper_node = /ctimanager/bj-cti1,/ctimanager/bj-cti2,/ctimanager/bj-cti3,/ctimanager/bj-cti4,/ctimanager/bj-cti5
脚本运行输出
[devops@my-dev watch_zookeeper]$ ./check_zookeeper_node.py
config file ---> /home/devops/Python/ABC/watch_zookeeper/cfg/config.ini
2023-12-20 19:20:18,946 139836511770432 check_zookeeper_node.py:73 INFO get config file zookeeper node list: ['/ctimanager/bj-cti1', '/ctimanager/bj-cti2', '/ctimanager/bj-cti3', '/ctimanager/bj-cti4', '/ctimanager/bj-cti5']
2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:73 INFO get config file zookeeper node list: ['/ctimanager/bj-cti1', '/ctimanager/bj-cti2', '/ctimanager/bj-cti3', '/ctimanager/bj-cti4', '/ctimanager/bj-cti5']
2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:76 INFO zookeeper login host: 127.0.0.1:2182
2023-12-20 19:26:35,660 140538316216128 check_zookeeper_node.py:81 INFO use user login auth conn zookeeper
2023-12-20 19:26:35,661 140538316216128 check_zookeeper_node.py:83 INFO zookeeper user info: digest admin:123456
2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2}' 状态正常
2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
2023-12-20 19:26:35,663 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
2023-12-20 19:26:35,664 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
2023-12-20 19:26:35,665 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
2023-12-20 19:26:35,666 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti5 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti5 正常
2023-12-20 19:26:45,674 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
2023-12-20 19:26:45,676 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
2023-12-20 19:26:45,677 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
2023-12-20 19:26:45,678 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
2023-12-20 19:26:55,685 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
2023-12-20 19:26:55,685 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
2023-12-20 19:26:55,686 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
2023-12-20 19:26:55,686 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
2023-12-20 19:26:55,687 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
2023-12-20 19:26:55,688 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
2023-12-20 19:26:55,689 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
评论区