Resilience4j 入门与实战
resilience4j
前言
在微服务架构中,我们常常需要处理各种 分布式系统故障场景,例如:
- 下游服务响应缓慢或超时
- 外部 API 不稳定导致频繁失败
- 某个接口被频繁调用造成雪崩
要提高系统的稳定性与弹性(Resilience),就离不开熔断(Circuit Breaker)、限流(Rate Limiter)、重试(Retry)、**超时(TimeLimiter)**等机制。
过去,我们可能使用 Netflix 的 Hystrix,但 Hystrix 已经停止维护。如今,Resilience4j 成为了最受欢迎的轻量级替代方案。
Resilience4j 是什么?
Resilience4j 是一个专为 Java 8+ 设计的轻量级容错库,灵感来自 Hystrix,但采用 模块化设计,每个功能都可独立引入。
它提供了以下核心模块:
| 模块 (Module) | 作用 (Purpose) |
|---|---|
| 熔断器:CircuitBreaker | 当失败率高时自动中断请求,防止系统雪崩。它有三种状态: - Closed:正常工作状态 - Open:调用中断,不再请求下游服务 - Half-Open:测试状态,尝试少量请求验证是否恢复 |
| 重试机制:Retry | 请求失败后自动重试指定次数。 |
| 限流:RateLimiter | 控制每个时间窗口内的调用次数,保护服务不被压垮。 |
| 超时控制:TimeLimiter | 限制方法执行的时长,避免长时间阻塞。 |
| 舱壁隔离:Bulkhead | 限制并发调用数,将故障隔离,防止线程池耗尽。 |
| 缓存:Cache | 缓存方法执行结果,减少重复请求和后端负载。 |
模块化的好处是 —— 想用哪个就加哪个,无需引入庞大的依赖体系。
在 Spring Boot 中集成 Resilience4j
maven依赖
<dependency><groupId>io.github.resilience4j</groupId><artifactId>resilience4j-spring-boot3</artifactId><version>2.2.0</version>
</dependency><!--resilience4j基于AOP,必须添加aop依赖 -->
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-aop</artifactId></dependency>
实战
CircuitBreaker - 熔断(编程式)
@RestController
@RequestMapping("/api")
public class ApiController {private final CircuitBreakerRegistry circuitBreakerRegistry;private final io.github.resilience4j.circuitbreaker.CircuitBreaker circuitBreaker;public ApiController(CircuitBreakerRegistry circuitBreakerRegistry) {this.circuitBreakerRegistry = circuitBreakerRegistry;// 编程方式创建 CircuitBreaker 配置CircuitBreakerConfig config = CircuitBreakerConfig.custom().slidingWindowSize(5) // 滑动窗口大小.minimumNumberOfCalls(2) // 最小调用次数.failureRateThreshold(50) // 失败率阈值 50%.waitDurationInOpenState(Duration.ofSeconds(5)) // OPEN 状态持续时间.permittedNumberOfCallsInHalfOpenState(2) // 半开状态允许调用次数.recordExceptions(RuntimeException.class, IOException.class) // 记录的异常.build();// 手动创建 CircuitBreaker 实例this.circuitBreaker = circuitBreakerRegistry.circuitBreaker("programmaticCB", config);}@GetMapping("/unreliable0/{id}")public String unreliableApi0(@PathVariable("id") Integer id) {// 使用 CircuitBreaker 包装业务逻辑Supplier<String> decoratedSupplier = CircuitBreaker.decorateSupplier(circuitBreaker, () -> {if (id % 2 == 0) {throw new RuntimeException("Service failed for id: " + id);}return "Success for id: " + id;});try {return decoratedSupplier.get();} catch (Exception e) {return fallback(id, e);}}// Fallbacks 方法必须跟原方法签名匹配,最后一个参数是异常public String fallback(Integer id, Exception e) {return "Fallback(id:'" + id + "'): " + e.getMessage();}
}
执行结果
一直请求/api/unreliable0/2
- 前2次请求: 返回status:500 ,
Fallback(id:'2'): Service failed for id: 2 - 第3次请求: 返回status:200,
Fallback(id:'2'): CircuitBreaker 'programmaticCB' is OPEN and does not permit further calls - 后续持续请求
/api/unreliable0/1,会交替(HALF) 返回2以及Success for id: 1,最终API会完全恢复正常
CircuitBreaker - 熔断(配置)
API
// 熔断 + 重试@GetMapping("/unreliable/{id}")@io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker(name = "unstableService_1", fallbackMethod = "fallback")
// @Retry(name = "unstableService_1")public String unreliableApi(@PathVariable("id") Integer id) {if (id.intValue() % 2 == 0) {throw new RuntimeException("External API failed!");}return "External API success";}// Fallbacks 方法必须跟原方法签名匹配,最后一个参数是异常public String fallback(Integer id, Exception e) {return "Fallback(id:'" + id + "'): " + e.getMessage();}
application.yaml(circuitbreaker)
resilience4j:circuitbreaker:configs: # 1. 通用配置 (configs)default1: # 定义一个名为 'default1' 的配置模板register-health-indicator: true # 是否注册到 Spring Boot Actuator 健康检查wait-duration-in-open-state: 10s # 熔断器保持开启(OPEN)状态的时长,超过这个时间会进入半开(HALF_OPEN)状态sliding-window-type: COUNT_BASED # 滑动窗口类型,基于时间(TIME_BASED)或基于调用次数(COUNT_BASED)sliding-window-size: 10 # 滑动窗口大小minimum-number-of-calls: 5 # 最小调用次数,低于该值不会触发熔断器计算failure-rate-threshold: 50 # 失败率阈值,超过该值熔断器跳闸permitted-number-of-calls-in-half-open-state: 3 # 半开状态允许的调用次数, 如果n次调用都成功则熔断器关闭(CLOSE)# 定义其它的配置模板...instances: # 2. 具体实例 (instances)unstableService_1: # 为 'unstableService_1' 定义熔断器实例base-config: default1 # 继承 'default1' 配置模板record-exceptions: # 哪些异常计入失败率- java.lang.RuntimeException- java.io.IOException
# ignore-exceptions: # 哪些异常忽略,不计入失败率
# ignore-exception-predicate: # 自定义忽略异常的断言-- implements Predicate<Throwable>
CircuitBreaker - 熔断 + 重试
API
将上文注释掉的@Retry(name = "unstableService_1") 恢复即可。
application.yaml(retry)
跟前文circuitbreaker 配置一样,retry也分为configs(通用模板)和instances(示例)配置,后文不再单独介绍config.
resilience4j:retry:instances:unstableService_1:max-attempts: 3 # 最大重试次数(包含初次调用),默认3次wait-duration: 2s # 每次重试之间的等待时间(固定间隔)enable-exponential-backoff: true # 启用指数退避(每次间隔倍增)exponential-backoff-multiplier: 2 # 每次重试的等待时间乘以2倍retry-exceptions: # 需要触发重试的异常类型(白名单)- java.io.IOException- java.util.concurrent.TimeoutException# ignore-exceptions: # 不会重试的异常类型(黑名单)# - com.example.exception.BusinessExceptionfail-after-max-attempts: true # 达到最大重试次数后是否直接抛出异常(默认true)
RateLimiter(限流)
API
@GetMapping("/limited")@RateLimiter(name = "limitedService2", fallbackMethod = "rateLimitFallback")public String limitedApi() {return "Normal API response";}public String rateLimitFallback(Exception e) {return "Rate limit exceeded";}
application.yaml(rateLimiter)
resilience4j:ratelimiter:instances:limitedService2: # 限流实例名称limit-for-period: 5 # 每个刷新周期内允许的最大调用次数(如每10秒5次)limit-refresh-period: 10s # 限流周期长度,超过后计数清零(默认500ns)timeout-duration: 2s # 获取许可的最大等待时间(超过则抛 RequestNotPermitted)
# subscribe-for-events: true # 是否启用事件订阅(用于日志、监控)
# writable-stack-trace-enabled: true # 是否打印堆栈信息(影响性能)
# event-consumer-buffer-size: 100 # 缓冲区大小
TimeLimiter(超时控制)
API
// 超时控制@GetMapping("/slow")@TimeLimiter(name = "slowService3", fallbackMethod = "timeoutFallback")public CompletableFuture<String> slowApiCall() {return CompletableFuture.supplyAsync(() -> {try {TimeUnit.MILLISECONDS.sleep(1000 + new Random().nextInt(2000));} catch (InterruptedException e) {throw new IllegalStateException(e);}return " Slow API success";});}public CompletableFuture<String> timeoutFallback(Throwable ex) {return CompletableFuture.completedFuture("Fallback: Operation timed out");}
注意:@TimeLimiter 只能用于 返回 CompletionStage 或 CompletableFuture 的方法, 否则会抛出 IllegalStateException
application.yaml(timeLimiter)
resilience4j:timelimiter:instances:slowService3: # 实例名timeout-duration: 2s # 最大执行时间,超时将抛出 TimeoutExceptioncancel-running-future: true # 是否在超时后中断正在运行的线程(true=中断)
结合actuator
maven依赖
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
application.yaml(actuator)
management:endpoints:web:exposure:include: health,info,circuitbreakers,ratelimiters
查看状态(/actuator/circuitbreakers)
{"circuitBreakers": {"programmaticCB": {"failureRate": "0.0%","slowCallRate": "0.0%","failureRateThreshold": "50.0%","slowCallRateThreshold": "100.0%","bufferedCalls": 4,"failedCalls": 0,"slowCalls": 0,"slowFailedCalls": 0,"notPermittedCalls": 0,"state": "CLOSED"},"unstableService_1": {"failureRate": "-1.0%","slowCallRate": "-1.0%","failureRateThreshold": "50.0%","slowCallRateThreshold": "100.0%","bufferedCalls": 0,"failedCalls": 0,"slowCalls": 0,"slowFailedCalls": 0,"notPermittedCalls": 0,"state": "CLOSED"}}
}
结合RestClient
Spring 6.x HTTP interface 使用说明
由于resilience4j是基于AOP实现的,不限定是否注解在controller还是service,所以也天然支持RestClient,
- 最好的实现方式是将RestClient包装到一个service
- 基于service method 加上Resilience4j注解
例1(伪代码)
@Service
public class PostService {@Autowiredprivate final RestClient client;@Retry(name = "postClient", fallbackMethod = "fallback")@RateLimiter(name = "postClient")public Post getPostById(int id) {return client.getPostById(id)) }// fallback 方法必须参数相同,并多一个 Throwableprivate Post fallback(int id, Throwable ex) {Post fallbackPost = new Post();fallbackPost.setId(id);fallbackPost.setTitle("Fallback title due to: " + ex.getMessage());fallbackPost.setBody("Default content");return fallbackPost;}
}