redis-哨兵模式配置整理
redis-哨兵模式配置整理
- 哨兵模式概述
- 核心架构组成
- 1. 哨兵节点 (Sentinel)
- 2. 数据节点
- 哨兵工作机制详解
- 1. 监控机制
- 故障转移详细流程
- 1. 故障发现阶段
- 配置详解
- 1. 基本哨兵配置(通常位于/etc/redis/sentinel.conf)
- 2. 关键参数说明
- 启动 Sentinel 监控有几种方式
- 1. 使用 redis-sentinel 命令启动
- 2. 使用 redis-server 命令启动
- 3. 完整的启动脚本示例
- 4. 验证哨兵状态
- 5. 停止哨兵服务
- 业务中连接方式
- 1. 自动发现机制
- 2. 连接状态管理
- 部署最佳实践
- 1. 哨兵节点部署方案
- 2. 一种可行的网络拓扑方式
- 故障场景分析
- 1. 主节点故障
- 2. 网络分区场景
- 监控与运维
- 1. 关键监控指标
- 2. 日常运维命令
- 优缺点分析
- Redis Sentinel 自动发现机制的 Lua 实现
- 完整的 Lua 实现方案
- 1. 核心 Sentinel 发现类
- 2. 连接池管理
- 3. 使用示例和配置
- 4. 应用层封装
- 5. 故障转移处理增强
- 核心特性说明
- 另一种基于skyent框架实现示例
哨兵模式概述
Redis 哨兵(Sentinel)是 Redis 官方提供的高可用性解决方案,用于管理 Redis主从架构,实现自动故障检测和故障转移

核心架构组成
1. 哨兵节点 (Sentinel)
# 哨兵节点配置示例
port 26379
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
2. 数据节点
- 主节点 (Master):处理写操作和读操作
- 从节点 (Slave):复制主节点数据,处理读操作
哨兵工作机制详解
1. 监控机制

- 可以配置N个哨兵节点只需要监控主节点 - 哨兵会自动发现主节点下的所有从节点
- 例如可以在同一机器上启动多个哨兵 - 使用不同端口即可(26379、26380、26381),但线上不建议,因为一旦机器挂掉则对应的监控哨兵都会挂掉
- 配置方式 - 每个哨兵配置文件都指向同一个主节点即可
故障转移详细流程
1. 故障发现阶段

配置详解
1. 基本哨兵配置(通常位于/etc/redis/sentinel.conf)
# sentinel.conf
port 26379
daemonize yes
pidfile /var/run/redis-sentinel.pid # 每个哨兵用不同的pid文件
logfile "/var/log/redis/sentinel.log" # 每个哨兵用不同的日志文件# 监控配置
sentinel monitor mymaster 192.168.1.100 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1# 密码认证(如有)
sentinel auth-pass mymaster MyPassword
其中:
mymaster是给主节点起的名字,2表示至少需要2个哨兵同意才能判断主节点客观下线。例如有三个哨兵,所以通常设置为2(多数原则)
重要提醒:
1. 关键问题:注意添加的sentinel_XXX.conf,不可以出现相同的 sentinel myid 会导致哨兵无法相互识别
2. 自动生成内容:哨兵启动后会自动在配置文件中添加运行时信息,不要手动修改这些内容
3. 启动顺序:先启动Redis主从,再启动哨兵
4. 等待时间:给哨兵足够的时间进行相互发现(30-60秒)
如果出现因为myid相同知道哨兵节点无法互相识别,将对应配置中的最后一部分删掉,然后重新启动既可以:从 “# Generated by CONFIG REWRITE” 开始到文件末尾的所有内容
2. 关键参数说明
| 参数 | 默认值 | 说明 |
|---|---|---|
| down-after-milliseconds | 30000 | 主观下线判定时间 |
| failover-timeout | 180000 | 故障转移超时时间 |
| parallel-syncs | 1 | 同时同步的从节点数 |
| quorum | 2 | 客观下线所需票数 |
启动 Sentinel 监控有几种方式
1. 使用 redis-sentinel 命令启动
# 分别启动三个哨兵
redis-sentinel /path/to/sentinel1.conf
redis-sentinel /path/to/sentinel2.conf
redis-sentinel /path/to/sentinel3.conf
2. 使用 redis-server 命令启动
# 也可以使用 redis-server 命令
redis-server /path/to/sentinel1.conf --sentinel
redis-server /path/to/sentinel2.conf --sentinel
redis-server /path/to/sentinel3.conf --sentinel# 后台启动
redis-server /path/to/sentinel1.conf --sentinel --daemonize yes
3. 完整的启动脚本示例
#!/bin/bashSENTINEL_CONF_DIR="/etc/redis/sentinel"
SENTINEL_PORTS=(26379 26380 26381)for port in "${SENTINEL_PORTS[@]}"; doconfig_file="${SENTINEL_CONF_DIR}/sentinel-${port}.conf"echo "启动哨兵端口: $port, 配置文件: $config_file"redis-sentinel $config_file --daemonize yes
doneecho "所有哨兵节点启动完成"
4. 验证哨兵状态
# 连接哨兵查看信息
redis-cli -p 26379
127.0.0.1:26379> info sentinel# 或者查看主节点信息
127.0.0.1:26379> sentinel masters
127.0.0.1:26379> sentinel slaves mymaster # 查看从节点
127.0.0.1:26379> sentinel sentinels mymaster # 查看其他哨兵
5. 停止哨兵服务
# 查看进程
ps aux | grep redis-sentinel# 优雅停止
redis-cli -p 26379 shutdown
redis-cli -p 26380 shutdown
redis-cli -p 26381 shutdown
重要提醒
- 确保配置文件路径正确
- 每个哨兵使用不同的端口和pid文件
- 检查防火墙规则,确保哨兵节点之间可以通信
- 查看日志文件确认启动是否成功
启动后,哨兵会自动相互发现并组成哨兵集群来监控你的Redis主从架构,可以进入到对应的哨兵端口,使用info sentinel查看sentinels=X, 其中X为哨兵集群中的哨兵数量
业务中连接方式
1. 自动发现机制
// Java客户端示例
Set<String> sentinels = new HashSet<>();
sentinels.add("192.168.1.10:26379");
sentinels.add("192.168.1.11:26379");
sentinels.add("192.168.1.12:26379");JedisSentinelPool pool = new JedisSentinelPool("mymaster", sentinels);
try (Jedis jedis = pool.getResource()) {jedis.set("key", "value");
}
2. 连接状态管理

部署最佳实践
1. 哨兵节点部署方案
# 推荐部署奇数个哨兵节点
节点分布:
- 哨兵1: 192.168.1.10:26379
- 哨兵2: 192.168.1.11:26379
- 哨兵3: 192.168.1.12:26379# 数据节点
- 主节点: 192.168.1.100:6379
- 从节点1: 192.168.1.101:6379
- 从节点2: 192.168.1.102:6379
2. 一种可行的网络拓扑方式

故障场景分析
1. 主节点故障
# 故障转移日志示例
1. +sdown master mymaster 192.168.1.100 6379
2. +odown master mymaster 192.168.1.100 6379
3. +vote-for-leader sentinel1 1
4. +elected-leader master mymaster
5. +failover-state-select-slave master mymaster
6. +selected-slave slave 192.168.1.101:6379
7. +failover-state-send-slaveof-noone
8. +failover-state-wait-promotion
9. +promoted-slave slave 192.168.1.101:6379
10. +failover-state-reconf-slaves
11. +switch-master mymaster 192.168.1.100 6379 192.168.1.101 6379
2. 网络分区场景

监控与运维
1. 关键监控指标
# 查看哨兵状态
redis-cli -p 26379 info sentinel# 监控指标
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
2. 日常运维命令
# 查看主节点信息
redis-cli -p 26379 sentinel get-master-addr-by-name mymaster# 手动故障转移
redis-cli -p 26379 sentinel failover mymaster# 查看从节点信息
redis-cli -p 26379 sentinel slaves mymaster# 添加监控
redis-cli -p 26379 sentinel monitor newmaster 192.168.1.200 6379 2
优缺点分析
优点 ✅
1. 自动故障转移:无需人工干预
2. 配置中心:客户端自动发现主节点
3. 监控完备:全面的节点健康检查
4. 官方支持:Redis 官方维护,稳定性高
缺点 ❌
1. 写操作单点:主节点仍然是写瓶颈
2. 数据一致性:异步复制可能导致数据丢失
3. 配置复杂:需要正确配置多个参数
4. 资源消耗:需要额外的哨兵节点
Redis Sentinel 自动发现机制的 Lua 实现
完整的 Lua 实现方案
1. 核心 Sentinel 发现类
local redis = require("resty.redis")
local cjson = require("cjson")local RedisSentinel = {}
RedisSentinel.__index = RedisSentinelfunction RedisSentinel.new(sentinels, master_name, options)local self = setmetatable({}, RedisSentinel)self.sentinel_hosts = sentinels or {}self.master_name = master_name or "mymaster"self.options = options or {}self.current_master = nilself.sentinel_timeout = self.options.timeout or 1000 -- 毫秒self.retry_count = self.options.retry_count or 3self.connection_pool = {}return self
end-- 从 Sentinel 获取当前主节点信息
function RedisSentinel:get_master_from_sentinel(sentinel_host)local red = redis:new()red:set_timeout(self.sentinel_timeout)local ok, err = red:connect(sentinel_host.host, sentinel_host.port)if not ok thenreturn nil, "Failed to connect to sentinel: " .. errend-- 如果配置了密码if sentinel_host.password thenlocal auth_ok, auth_err = red:auth(sentinel_host.password)if not auth_ok thenred:close()return nil, "Sentinel auth failed: " .. auth_errendend-- 查询主节点信息local res, err = red:command("sentinel", "get-master-addr-by-name", self.master_name)if not res or type(res) ~= "table" or #res < 2 thenred:close()return nil, "Failed to get master from sentinel: " .. (err or "invalid response")endred:close()return {host = res[1],port = tonumber(res[2])}
end-- 自动发现主节点(轮询所有 Sentinel)
function RedisSentinel:discover_master()for i = 1, self.retry_count dofor _, sentinel in ipairs(self.sentinel_hosts) dolocal master, err = self:get_master_from_sentinel(sentinel)if master then-- 验证主节点是否可用local is_alive, valid_err = self:validate_master(master)if is_alive thenself.current_master = masterreturn masterelsengx.log(ngx.WARN, "Master validation failed: ", valid_err)endelsengx.log(ngx.WARN, "Sentinel ", sentinel.host, ":", sentinel.port, " failed: ", err)endend-- 所有 Sentinel 都失败,等待后重试if i < self.retry_count thenngx.sleep(0.1) -- 等待 100ms 后重试endendreturn nil, "All sentinels failed after " .. self.retry_count .. " retries"
end-- 验证主节点是否可用
function RedisSentinel:validate_master(master)local red = redis:new()red:set_timeout(1000) -- 1秒超时local ok, err = red:connect(master.host, master.port)if not ok thenreturn false, "Connect failed: " .. errend-- 如果配置了密码if self.options.password thenlocal auth_ok, auth_err = red:auth(self.options.password)if not auth_ok thenred:close()return false, "Auth failed: " .. auth_errendend-- 执行 PING 命令验证local pong, ping_err = red:ping()if not pong thenred:close()return false, "Ping failed: " .. ping_errendred:close()return true
end-- 获取当前主节点连接
function RedisSentinel:get_master_connection()-- 如果当前主节点为空或需要刷新,则重新发现if not self.current_master thenlocal master, err = self:discover_master()if not master thenreturn nil, errendend-- 创建到主节点的连接local red = redis:new()red:set_timeout(self.options.redis_timeout or 5000)local ok, err = red:connect(self.current_master.host, self.current_master.port)if not ok then-- 连接失败,可能是故障转移,重新发现local master, discover_err = self:discover_master()if not master thenreturn nil, "Reconnect failed and rediscovery also failed: " .. discover_errend-- 使用新发现的主节点重连ok, err = red:connect(master.host, master.port)if not ok thenreturn nil, "Reconnect with new master failed: " .. errendend-- 认证if self.options.password thenlocal auth_ok, auth_err = red:auth(self.options.password)if not auth_ok thenred:close()return nil, "Redis auth failed: " .. auth_errendend-- 选择数据库if self.options.database thenlocal select_ok, select_err = red:select(self.options.database)if not select_ok thenred:close()return nil, "Select database failed: " .. select_errendendreturn red
end-- 获取从节点列表
function RedisSentinel:get_slaves()for _, sentinel in ipairs(self.sentinel_hosts) dolocal red = redis:new()red:set_timeout(self.sentinel_timeout)local ok, err = red:connect(sentinel.host, sentinel.port)if ok thenif sentinel.password thenred:auth(sentinel.password)endlocal slaves, err = red:command("sentinel", "slaves", self.master_name)red:close()if slaves and type(slaves) == "table" thenlocal slave_list = {}for _, slave_info in ipairs(slaves) doif type(slave_info) == "table" thenlocal slave = {}for i = 1, #slave_info, 2 doif slave_info[i] == "ip" thenslave.host = slave_info[i+1]elseif slave_info[i] == "port" thenslave.port = tonumber(slave_info[i+1])elseif slave_info[i] == "flags" thenslave.flags = slave_info[i+1]endend-- 只包含正常运行的从节点if not slave.flags or not slave.flags:find("s_down") thentable.insert(slave_list, slave)endendendreturn slave_listendendendreturn nil, "Failed to get slaves from all sentinels"
end-- 获取只读连接的从节点
function RedisSentinel:get_slave_connection()local slaves, err = self:get_slaves()if not slaves thenreturn nil, errend-- 随机选择一个从节点(负载均衡)if #slaves > 0 thenlocal slave = slaves[math.random(1, #slaves)]local red = redis:new()red:set_timeout(self.options.redis_timeout or 5000)local ok, err = red:connect(slave.host, slave.port)if ok then-- 认证和数据库选择if self.options.password thenred:auth(self.options.password)endif self.options.database thenred:select(self.options.database)endreturn redendendreturn nil, "No available slaves"
endreturn RedisSentinel
2. 连接池管理
local RedisConnectionPool = {}
RedisConnectionPool.__index = RedisConnectionPoolfunction RedisConnectionPool.new(sentinel, pool_size)local self = setmetatable({}, RedisConnectionPool)self.sentinel = sentinelself.pool_size = pool_size or 10self.master_pool = {}self.slave_pool = {}return self
endfunction RedisConnectionPool:get_master_connection()-- 尝试从连接池获取for i = #self.master_pool, 1, -1 dolocal conn = table.remove(self.master_pool, i)if conn and self:is_connection_alive(conn) thenreturn connendend-- 连接池为空或连接无效,创建新连接return self.sentinel:get_master_connection()
endfunction RedisConnectionPool:get_slave_connection()-- 尝试从连接池获取for i = #self.slave_pool, 1, -1 dolocal conn = table.remove(self.slave_pool, i)if conn and self:is_connection_alive(conn) thenreturn connendend-- 连接池为空或连接无效,创建新连接return self.sentinel:get_slave_connection()
endfunction RedisConnectionPool:release_master_connection(conn)if conn and #self.master_pool < self.pool_size thentable.insert(self.master_pool, conn)elseif conn thenconn:close()endend
endfunction RedisConnectionPool:release_slave_connection(conn)if conn and #self.slave_pool < self.pool_size thentable.insert(self.slave_pool, conn)elseif conn thenconn:close()endend
endfunction RedisConnectionPool:is_connection_alive(conn)local ok, err = conn:ping()return ok and true or false
endfunction RedisConnectionPool:close_all()for _, conn in ipairs(self.master_pool) doconn:close()endfor _, conn in ipairs(self.slave_pool) doconn:close()endself.master_pool = {}self.slave_pool = {}
end
3. 使用示例和配置
-- config.lua
local config = {sentinels = {{ host = "192.168.1.10", port = 26379 },{ host = "192.168.1.11", port = 26379 },{ host = "192.168.1.12", port = 26379 }},master_name = "mymaster",options = {password = "your_password",database = 0,timeout = 1000,retry_count = 3,redis_timeout = 5000}
}return config
4. 应用层封装
-- app_redis.lua
local RedisSentinel = require("redis_sentinel")
local RedisConnectionPool = require("redis_connection_pool")
local config = require("config")local _M = {}
_M._VERSION = "1.0"local sentinel = RedisSentinel.new(config.sentinels,config.master_name,config.options
)local connection_pool = RedisConnectionPool.new(sentinel, 20)-- 执行写操作(主节点)
function _M:execute_write(command, ...)local conn, err = connection_pool:get_master_connection()if not conn thenreturn nil, errendlocal ok, result = pcall(function()return conn[command](conn, ...)end)-- 释放连接回连接池connection_pool:release_master_connection(conn)if not ok thenreturn nil, resultendreturn result
end-- 执行读操作(从节点)
function _M:execute_read(command, ...)local conn, err = connection_pool:get_slave_connection()if not conn then-- 从节点失败,降级到主节点conn, err = connection_pool:get_master_connection()if not conn thenreturn nil, errendendlocal ok, result = pcall(function()return conn[command](conn, ...)end)-- 释放连接回连接池if conn thenconnection_pool:release_slave_connection(conn)endif not ok thenreturn nil, resultendreturn result
end-- 批量操作
function _M:execute_pipeline(commands, use_slave)local conn, errif use_slave thenconn, err = connection_pool:get_slave_connection()if not conn thenconn, err = connection_pool:get_master_connection()endelseconn, err = connection_pool:get_master_connection()endif not conn thenreturn nil, errend-- 开始管道conn:init_pipeline()for _, cmd in ipairs(commands) dolocal command = cmd.commandlocal args = cmd.args or {}conn[command](conn, unpack(args))endlocal results, err = conn:commit_pipeline()-- 释放连接if use_slave thenconnection_pool:release_slave_connection(conn)elseconnection_pool:release_master_connection(conn)endif not results thenreturn nil, errendreturn results
end-- 强制刷新主节点信息
function _M:refresh_master_info()return sentinel:discover_master()
end-- 获取当前主节点信息
function _M:get_current_master()return sentinel.current_master
end-- 健康检查
function _M:health_check()local master_conn, err = connection_pool:get_master_connection()if not master_conn thenreturn false, "Master connection failed: " .. (err or "unknown")endlocal ok, ping_err = master_conn:ping()connection_pool:release_master_connection(master_conn)if not ok thenreturn false, "Master ping failed: " .. (ping_err or "unknown")endreturn true, "Healthy"
endreturn _M
5. 故障转移处理增强
-- failover_handler.lua
local _M = {}function _M:handle_failover(sentinel, old_master, new_master)ngx.log(ngx.INFO, "Failover detected: ", old_master.host, ":", old_master.port, " -> ",new_master.host, ":", new_master.port)-- 清理连接池中旧的连接if connection_pool thenconnection_pool:close_all()end-- 发送通知(可选)self:send_failover_notification(old_master, new_master)-- 更新应用状态self:update_application_state(new_master)
endfunction _M:send_failover_notification(old_master, new_master)-- 实现通知逻辑,如发送邮件、短信、Webhook等local message = string.format("Redis failover occurred: %s:%d -> %s:%d",old_master.host, old_master.port,new_master.host, new_master.port)-- 示例:记录到日志文件ngx.log(ngx.WARN, message)
endfunction _M:update_application_state(new_master)-- 更新应用配置或状态-- 例如:刷新本地缓存、更新配置中心等
endreturn _M
核心特性说明
1. 自动故障检测
- 连接失败时自动重试其他 Sentinel
- 验证主节点可用性
- 支持故障转移后的自动重连
2. 负载均衡
- 从节点读操作的随机选择
- 连接池管理减少连接开销
- 支持读写分离
3. 容错机制
- 多 Sentinel 节点轮询
- 从节点失败时降级到主节点
- 连接池健康检查
4. 性能优化
- 连接复用减少开销
- 异步操作支持
- 合理的超时和重试配置
另一种基于skyent框架实现示例
local skynet = require "skynet"
local redis = require "skynet.db.redis"
local cjson = require "cjson"
local cfgmgr = require "cfgmgr"local redis_db---从哨兵那里获取真正的master ,返回ip 端口
local function get_master_info(sentinel_cfg)for _, conn in ipairs(sentinel_cfg.hosts) dolocal ret, db = pcall(redis.connect, conn)if ret and db thenlocal ret = db:sentinel("get-master-addr-by-name", sentinel_cfg.master )db:disconnect()return ret[1], ret[2]endend
end---根据配置去找有效的redis连接
local function get_available_redis()local sentinel_cfg = cfgmgr.get_redis_sentinel_cfg()local conn = {}--有哨兵配置就走哨兵配置if sentinel_cfg and #sentinel_cfg.hosts > 0 thenconn = sentinel_cfg.redis_conn or {}conn.host, conn.port = get_master_info(sentinel_cfg)INFO("get_available_redis form sentinel", conn)if not conn.host thenreturnendelseconn = cfgmgr.get_redis_cfg()INFO("get_available_redis form common redis", conn)endlocal ret, db = pcall(redis.connect, conn)if ret and db thenINFO("get_available_redis success!!", conn)return dbend
endlocal function redis_conn_timer() if not redis_db thenredis_db = get_available_redis()elselocal result = redis_db:ping()-- DEBUG("[init][ping] result=", result)if not result thenredis_db = nilendendlocal update_time = 100if not redis_db thenERROR("[init] redis connect error!please check!")elseupdate_time = 1000endskynet.timeout(update_time, redis_conn_timer)
endfunction init()DEBUG("[init] Start init redis service ...")cjson.encode_sparse_array(true, 1, 1)redis_conn_timer()DEBUG("[init] connect to reids server success, Init server service success!")
endfunction response.do_cmd(cmd, key,...)--DEBUG("[redis do cmd]:",cmd,key, ...)local result = nilif redis_db thenresult = redis_db[cmd](redis_db, key,...)elseERROR("[redis do cmd] ERROR ", cmd,key, ...)end return result
end--[[ -- 配置示例
-- redis-sentine 哨兵配置 有就走哨兵拿配置redis_sentinel = {-- redis_conn = {}, -- 从哨兵查询host 和port 会写到这里作为redis配置master = "mymaster",hosts = {{host = REDIS_IP, port = 26379},{host = REDIS_IP, port = 26380},{host = REDIS_IP, port = 26381},},},
]]--
