Spring Boot Actuator 监控机制解析
Spring Boot Actuator 监控机制解析
文章目录
- Spring Boot Actuator 监控机制解析
- 🎯 一、为什么需要 Actuator:从监控到可观测性
- 💡 传统监控的局限性
- 🔄 监控到可观测性的演进
- 🏗️ 二、核心结构与自动装配原理
- 🏛️ Actuator 架构层次
- 🔧 自动装配源码解析
- 📊 端点发现机制
- 💓 三、健康检查(HealthEndpoint)
- 🏥 健康检查架构设计
- 🔧 内置健康检查实现
- 💻 自定义健康检查实战
- 📋 健康检查配置优化
- 📊 四、Metrics:核心指标体系与 Micrometer 框架
- 🌡️ Micrometer 架构概述
- 🔧 核心指标收集实战
- 📈 JVM 和系统指标监控
- 🔄 Micrometer 与 Prometheus 集成
- ℹ️ 五、InfoEndpoint:应用元信息展示
- 📝 信息收集机制
- 🔧 信息配置实战
- 🔧 六、自定义 Endpoint
- 🛠️ 端点类型与使用场景
- 💻 自定义端点实战
- 🔒 七、安全与开放配置
- 🛡️ 端点安全配置策略
- 📊 端点访问控制矩阵
- 🔍 端点监控与审计
- 💎 八、小结
- 🎯 Actuator 监控体系核心价值
- 📈 监控体系演进路线
- 🔧 企业级最佳实践
🎯 一、为什么需要 Actuator:从监控到可观测性
💡 传统监控的局限性
传统监控的挑战:
// ❌ 传统方式:手动暴露监控信息
@RestController
public class ManualMonitoringController {@GetMapping("/monitor/health")public Map<String, Object> healthCheck() {// 每个应用都要重复实现Map<String, Object> status = new HashMap<>();status.put("status", "UP");status.put("timestamp", System.currentTimeMillis());// 需要手动检查各个组件...return status;}
}
Actuator 的统一解决方案:
# ✅ Actuator 方式:标准化监控端点
management:endpoints:web:exposure:include: health,metrics,infoendpoint:health:show-details: alwaysshow-components: always
🔄 监控到可观测性的演进
可观测性三大支柱:
Actuator 在可观测性中的定位:
- 指标收集:通过 Metrics 端点暴露应用指标
- 健康状态:通过 Health 端点提供系统健康度
- 元信息管理:通过 Info 端点展示应用版本和配置
- 自定义监控:通过自定义端点扩展监控能力
🏗️ 二、核心结构与自动装配原理
🏛️ Actuator 架构层次
核心组件关系图:
🔧 自动装配源码解析
EndpointAutoConfiguration 核心代码:
@Configuration(proxyBeanMethods = false)
@AutoConfigureAfter({ HealthEndpointAutoConfiguration.class,MetricsEndpointAutoConfiguration.class })
public class EndpointAutoConfiguration {@Bean@ConditionalOnMissingBeanpublic HealthEndpoint healthEndpoint(HealthContributorRegistry registry) {return new HealthEndpoint(registry);}@Bean@ConditionalOnMissingBeanpublic MetricsEndpoint metricsEndpoint(MeterRegistry registry) {return new MetricsEndpoint(registry);}@Bean@ConditionalOnMissingBeanpublic InfoEndpoint infoEndpoint(InfoContributorRegistry registry) {return new InfoEndpoint(registry);}
}
Web 端点自动配置:
@Configuration(proxyBeanMethods = false)
@ConditionalOnWebApplication
@ConditionalOnClass(WebEndpoint.class)
@AutoConfigureAfter(EndpointAutoConfiguration.class)
public class WebEndpointAutoConfiguration {@Bean@ConditionalOnMissingBeanpublic WebEndpointDiscoverer webEndpointDiscoverer(ApplicationContext applicationContext) {return new WebEndpointDiscoverer(applicationContext, OperationParameterMapper.instance(), EndpointMediaTypes.DEFAULT);}@Bean@ConditionalOnMissingBeanpublic ControllerEndpointHandlerMapping controllerEndpointHandlerMapping(WebEndpointDiscoverer discoverer) {return new ControllerEndpointHandlerMapping(discoverer);}
}
📊 端点发现机制
端点扫描与注册流程:
@Component
public class EndpointDiscoveryDebugger {@Autowiredprivate ApplicationContext applicationContext;@EventListenerpublic void onApplicationReady(ApplicationReadyEvent event) {log.info("=== Actuator 端点发现报告 ===");// 1. 查找所有端点BeanString[] endpointBeans = applicationContext.getBeanNamesForType(Endpoint.class);log.info("发现的端点数量: {}", endpointBeans.length);for (String beanName : endpointBeans) {Endpoint<?> endpoint = applicationContext.getBean(beanName, Endpoint.class);log.info("端点: {} (ID: {})", beanName, endpoint.id());}// 2. 检查Web端点暴露情况if (applicationContext.containsBean("webEndpointDiscoverer")) {WebEndpointDiscoverer discoverer = applicationContext.getBean(WebEndpointDiscoverer.class);Collection<WebEndpoint> webEndpoints = discoverer.getEndpoints();log.info("Web端点数量: {}", webEndpoints.size());}}
}
💓 三、健康检查(HealthEndpoint)
🏥 健康检查架构设计
HealthIndicator 层次结构:
🔧 内置健康检查实现
DataSourceHealthIndicator 源码分析:
public class DataSourceHealthIndicator extends AbstractHealthIndicator {private final DataSource dataSource;@Overrideprotected void doHealthCheck(Health.Builder builder) throws Exception {// 1. 检查数据源是否存在if (this.dataSource == null) {builder.down().withDetail("database", "Unknown");return;}// 2. 执行健康检查SQLtry (Connection connection = this.dataSource.getConnection()) {// 3. 获取数据库元信息DatabaseMetaData metaData = connection.getMetaData();builder.up().withDetail("database", metaData.getDatabaseProductName()).withDetail("version", metaData.getDatabaseProductVersion()).withDetail("validationQuery", getValidationQuery());} catch (Exception ex) {builder.down(ex);}}
}
💻 自定义健康检查实战
企业级健康检查示例:
@Component
@Slf4j
public class ComprehensiveHealthIndicator implements HealthIndicator {@Autowiredprivate RestTemplate restTemplate;@Autowiredprivate RedisTemplate<String, String> redisTemplate;@Value("${app.external.service.url}")private String externalServiceUrl;@Overridepublic Health health() {Health.Builder builder = new Health.Builder();try {// 1. 检查内部组件状态checkInternalComponents(builder);// 2. 检查外部依赖状态checkExternalDependencies(builder);// 3. 检查业务逻辑健康度checkBusinessHealth(builder);return builder.up().build();} catch (Exception e) {log.error("健康检查失败", e);return builder.down(e).build();}}private void checkInternalComponents(Health.Builder builder) {// 检查内存使用Runtime runtime = Runtime.getRuntime();long maxMemory = runtime.maxMemory();long usedMemory = runtime.totalMemory() - runtime.freeMemory();double memoryUsage = (double) usedMemory / maxMemory * 100;builder.withDetail("memory.used", formatMemory(usedMemory)).withDetail("memory.max", formatMemory(maxMemory)).withDetail("memory.usage", String.format("%.1f%%", memoryUsage));// 内存使用率警告if (memoryUsage > 90) {builder.withDetail("memory.warning", "内存使用率过高");}}private void checkExternalDependencies(Health.Builder builder) {// 检查Redis连接try {redisTemplate.opsForValue().get("health-check");builder.withDetail("redis", "CONNECTED");} catch (Exception e) {builder.withDetail("redis", "DISCONNECTED: " + e.getMessage());builder.down();}// 检查外部服务try {ResponseEntity<String> response = restTemplate.getForEntity(externalServiceUrl + "/health", String.class);builder.withDetail("external.service", response.getStatusCode().toString());} catch (Exception e) {builder.withDetail("external.service", "UNAVAILABLE: " + e.getMessage());builder.down();}}private void checkBusinessHealth(Health.Builder builder) {// 检查业务指标long pendingOrders = getPendingOrderCount();long failedPayments = getFailedPaymentCount();builder.withDetail("orders.pending", pendingOrders).withDetail("payments.failed", failedPayments);// 业务健康度规则if (pendingOrders > 1000) {builder.withDetail("business.warning", "待处理订单过多");}if (failedPayments > 100) {builder.withDetail("business.warning", "支付失败率较高");}}private String formatMemory(long bytes) {return String.format("%.1f MB", bytes / (1024.0 * 1024.0));}
}
📋 健康检查配置优化
分层健康检查配置:
management:endpoint:health:# 显示详细信息(生产环境谨慎使用)show-details: when_authorizedshow-components: always# 健康检查组group:# 核心组件检查(快速响应)core:include: diskSpace, database, redis# 完整检查(包含外部依赖)full:include: core, externalService, business# 自定义状态映射status:order: DOWN, OUT_OF_SERVICE, UP, UNKNOWNhttp-mapping:DOWN: 503OUT_OF_SERVICE: 503UP: 200
健康检查组的使用:
# 快速健康检查(只检查核心组件)
curl http://localhost:8080/actuator/health/core# 完整健康检查(包含所有依赖)
curl http://localhost:8080/actuator/health/full
📊 四、Metrics:核心指标体系与 Micrometer 框架
🌡️ Micrometer 架构概述
Micrometer 指标类型体系:
🔧 核心指标收集实战
自定义业务指标示例:
@Service
@Slf4j
public class OrderMetricsService {private final MeterRegistry meterRegistry;// 1. 计数器:订单创建数量private final Counter orderCreatedCounter;// 2. 计时器:订单处理时间private final Timer orderProcessingTimer;// 3. 分布摘要:订单金额分布private final DistributionSummary orderAmountSummary;// 4. 计量器:库存数量private final Gauge inventoryGauge;public OrderMetricsService(MeterRegistry meterRegistry) {this.meterRegistry = meterRegistry;// 初始化指标this.orderCreatedCounter = Counter.builder("order.created").description("订单创建数量").tag("application", "order-service").register(meterRegistry);this.orderProcessingTimer = Timer.builder("order.processing.time").description("订单处理时间").tag("application", "order-service").register(meterRegistry);this.orderAmountSummary = DistributionSummary.builder("order.amount").description("订单金额分布").baseUnit("CNY").register(meterRegistry);this.inventoryGauge = Gauge.builder("inventory.count").description("库存数量").tag("product", "all").register(meterRegistry, this);}@Asyncpublic CompletableFuture<Order> processOrder(Order order) {// 使用计时器记录处理时间return orderProcessingTimer.record(() -> {try {log.info("开始处理订单: {}", order.getId());// 记录订单创建orderCreatedCounter.increment();// 记录订单金额orderAmountSummary.record(order.getAmount().doubleValue());// 模拟业务处理Thread.sleep(100);log.info("订单处理完成: {}", order.getId());return CompletableFuture.completedFuture(order);} catch (Exception e) {// 记录错误指标meterRegistry.counter("order.error", "reason", e.getClass().getSimpleName()).increment();throw new RuntimeException("订单处理失败", e);}});}// Gauge 值提供方法public double getInventoryCount() {return getCurrentInventoryCount();}
}
📈 JVM 和系统指标监控
内置指标自动配置:
@Configuration
@ConditionalOnClass(MeterRegistry.class)
public class JvmMetricsAutoConfiguration {@Beanpublic JvmGcMetrics jvmGcMetrics() {return new JvmGcMetrics();}@Beanpublic JvmMemoryMetrics jvmMemoryMetrics() {return new JvmMemoryMetrics();}@Beanpublic JvmThreadMetrics jvmThreadMetrics() {return new JvmThreadMetrics();}@Beanpublic ProcessorMetrics processorMetrics() {return new ProcessorMetrics();}
}
关键JVM指标示例:
# 查看JVM内存指标
curl http://localhost:8080/actuator/metrics/jvm.memory.used# 查看GC指标
curl http://localhost:8080/actuator/metrics/jvm.gc.pause# 查看线程指标
curl http://localhost:8080/actuator/metrics/jvm.threads.live
🔄 Micrometer 与 Prometheus 集成
Prometheus 配置示例:
management:endpoints:web:exposure:include: health,metrics,prometheusmetrics:export:prometheus:enabled: truestep: 1mtags:application: ${spring.application.name}environment: ${spring.profiles.active}
Prometheus 格式指标输出:
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{application="order-service",environment="prod",area="heap",id="Eden Space",} 1234567.0
jvm_memory_used_bytes{application="order-service",environment="prod",area="heap",id="Survivor Space",} 234567.0
jvm_memory_used_bytes{application="order-service",environment="prod",area="heap",id="Tenured Gen",} 3456789.0# HELP order_created_total 订单创建数量
# TYPE order_created_total counter
order_created_total{application="order-service",environment="prod",} 1500.0
ℹ️ 五、InfoEndpoint:应用元信息展示
📝 信息收集机制
InfoContributor 架构:
🔧 信息配置实战
构建信息配置
<!-- Maven 构建信息生成 -->
<plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId><executions><execution><goals><goal>build-info</goal></goals></execution></executions>
</plugin>
Git 信息配置:
# application.yml
management:info:git:mode: fullenv:enabled: truebuild:enabled: true
自定义信息贡献器:
@Component
public class ComprehensiveInfoContributor implements InfoContributor {@Autowiredprivate Environment environment;@Value("${app.version}")private String appVersion;@Overridepublic void contribute(Info.Builder builder) {// 1. 应用基本信息builder.withDetail("application", Map.of("name", environment.getProperty("spring.application.name", "unknown"),"version", appVersion,"profiles", Arrays.toString(environment.getActiveProfiles()),"startupTime", LocalDateTime.now().format(DateTimeFormatter.ISO_LOCAL_DATE_TIME)));// 2. 系统信息builder.withDetail("system", Map.of("java.version", System.getProperty("java.version"),"os.name", System.getProperty("os.name"),"os.version", System.getProperty("os.version"),"user.timezone", System.getProperty("user.timezone")));// 3. 运行时信息Runtime runtime = Runtime.getRuntime();builder.withDetail("runtime", Map.of("availableProcessors", runtime.availableProcessors(),"maxMemory", runtime.maxMemory(),"totalMemory", runtime.totalMemory(),"freeMemory", runtime.freeMemory()));// 4. 业务信息builder.withDetail("business", Map.of("team", "订单中台团队","owner", "张工程师","slackChannel", "#order-service","documentation", "https://wiki.company.com/order-service"));}
}
🔧 六、自定义 Endpoint
🛠️ 端点类型与使用场景
端点类型对比:
| 端点类型 | 注解 | 用途 | 典型示例 |
|---|---|---|---|
| 🟢 只读端点 | @ReadOperation | 获取应用状态或统计信息 | 查看缓存状态、系统指标、配置值等 |
| 🟡 写入端点 | @WriteOperation | 执行更新、刷新等操作 | 刷新配置、动态切换日志级别 |
| 🔴 删除端点 | @DeleteOperation | 删除资源或清理数据 | 清空缓存、删除临时文件、清理日志 |
💻 自定义端点实战
业务监控端点示例:
@Endpoint(id = "orderstats")
@Component
@Slf4j
public class OrderStatsEndpoint {@Autowiredprivate OrderService orderService;@Autowiredprivate MeterRegistry meterRegistry;/*** 只读操作:获取订单统计信息*/@ReadOperationpublic OrderStats getOrderStats() {log.info("查询订单统计信息");OrderStats stats = new OrderStats();stats.setTotalOrders(orderService.getTotalOrderCount());stats.setTodayOrders(orderService.getTodayOrderCount());stats.setPendingOrders(orderService.getPendingOrderCount());stats.setAverageAmount(orderService.getAverageOrderAmount());// 从指标系统获取实时数据stats.setOrderRate(getCurrentOrderRate());stats.setErrorRate(getCurrentErrorRate());return stats;}/*** 写入操作:重置统计计数器*/@WriteOperationpublic String resetCounters(@Selector String counterType) {log.info("重置计数器: {}", counterType);switch (counterType) {case "all":meterRegistry.clear();return "所有计数器已重置";case "order":// 重置订单相关计数器meterRegistry.counter("order.created").close();return "订单计数器已重置";default:throw new IllegalArgumentException("不支持的计数器类型: " + counterType);}}/*** 删除操作:清理历史数据*/@DeleteOperationpublic String cleanupOldData(@Nullable Integer days) {int retentionDays = days != null ? days : 30;log.info("清理 {} 天前的数据", retentionDays);int deletedCount = orderService.cleanupOldOrders(retentionDays);return String.format("已清理 %d 条历史数据", deletedCount);}@Datapublic static class OrderStats {private long totalOrders;private long todayOrders;private long pendingOrders;private double averageAmount;private double orderRate;private double errorRate;private Timestamp timestamp = new Timestamp(System.currentTimeMillis());}
}
Web 专用端点示例:
@ControllerEndpoint(id = "webmonitor")
@Component
@Slf4j
public class WebMonitorEndpoint {private final Map<String, RequestStats> requestStats = new ConcurrentHashMap<>();/*** HTTP端点:实时请求监控*/@GetMapping("/actuator/webmonitor/requests")@ResponseBodypublic Map<String, Object> getRequestStats() {Map<String, Object> stats = new HashMap<>();stats.put("timestamp", System.currentTimeMillis());stats.put("totalEndpoints", requestStats.size());stats.put("endpoints", requestStats);// 计算总体统计long totalRequests = requestStats.values().stream().mapToLong(RequestStats::getRequestCount).sum();double avgResponseTime = requestStats.values().stream().mapToDouble(RequestStats::getAverageResponseTime).average().orElse(0.0);stats.put("totalRequests", totalRequests);stats.put("averageResponseTime", avgResponseTime);return stats;}/*** 记录请求统计*/public void recordRequest(String endpoint, long duration, boolean success) {RequestStats stats = requestStats.computeIfAbsent(endpoint, k -> new RequestStats());stats.recordRequest(duration, success);}@Datapublic static class RequestStats {private long requestCount;private long errorCount;private long totalResponseTime;private double averageResponseTime;public void recordRequest(long duration, boolean success) {this.requestCount++;if (!success) this.errorCount++;this.totalResponseTime += duration;this.averageResponseTime = (double) totalResponseTime / requestCount;}}
}
🔒 七、安全与开放配置
🛡️ 端点安全配置策略
分层安全配置示例:
# 开发环境配置
spring:profiles: dev
management:endpoints:web:exposure:include: '*'base-path: /manageendpoint:health:show-details: alwaysloggers:enabled: true# 生产环境配置
spring:profiles: prod
management:endpoints:web:exposure:include: health,metrics,infobase-path: /internal/manageendpoint:health:show-details: nevershow-components: nevershutdown:enabled: false
Spring Security 集成配置:
@Configuration
@EnableWebSecurity
public class ActuatorSecurityConfig {@Configuration@Order(1)public static class ActuatorWebSecurityConfigurationAdapter extends WebSecurityConfigurerAdapter {@Overrideprotected void configure(HttpSecurity http) throws Exception {http.antMatcher("/manage/**").authorizeRequests()// 健康端点对所有人开放.antMatchers("/manage/health").permitAll()// 指标端点需要监控角色.antMatchers("/manage/metrics").hasRole("MONITOR")// 敏感操作需要管理员角色.antMatchers("/manage/**").hasRole("ADMIN").and().httpBasic();}}
}
📊 端点访问控制矩阵
角色权限矩阵:
| 端点 | 匿名用户 | MONITOR 角色 | ADMIN 角色 | 说明 / 使用场景 |
|---|---|---|---|---|
/actuator/health | ✅ | ✅ | ✅ | 健康检查,对所有用户开放(K8s、Nacos、负载均衡探活使用) |
/actuator/info | ❌ | ✅ | ✅ | 应用基本信息,如版本、描述等,需认证访问 |
/actuator/metrics | ❌ | ✅ | ✅ | 性能指标数据(CPU、线程、JVM),监控角色可访问 |
/actuator/env | ❌ | ❌ | ✅ | 环境变量与配置属性,敏感信息,仅管理员访问 |
/actuator/loggers | ❌ | ❌ | ✅ | 动态调整日志级别,仅管理员使用 |
/actuator/shutdown | ❌ | ❌ | ✅ | 远程关闭应用(需显式启用),仅管理员访问 |
🔍 端点监控与审计
端点访问审计日志:
@Component
public class EndpointAccessAuditor {private static final Logger auditLogger = LoggerFactory.getLogger("ENDPOINT_AUDIT");@EventListenerpublic void auditEndpointAccess(EndpointExposedEvent event) {HttpServletRequest request = event.getRequest();auditLogger.info("端点访问审计 - 用户: {}, 端点: {}, IP: {}, 时间: {}", getCurrentUser(),request.getRequestURI(),request.getRemoteAddr(),Instant.now());}@EventListener public void auditSensitiveOperation(SensitiveOperationEvent event) {auditLogger.warn("敏感操作审计 - 用户: {}, 操作: {}, 参数: {}, 结果: {}",event.getUsername(),event.getOperation(),event.getParameters(),event.getResult());}
}
💎 八、小结
🎯 Actuator 监控体系核心价值
生产级监控检查清单:
- ✅ 健康检查:多层级健康状态监控
- ✅ 指标收集:业务+系统指标全面覆盖
- ✅ 信息管理:应用元数据标准化展示
- ✅ 安全控制:基于角色的端点访问控制
- ✅ 扩展能力:自定义端点满足业务需求
- ✅ 集成能力:与 Prometheus、Grafana 等工具无缝集成
📈 监控体系演进路线
监控成熟度模型:
🔧 企业级最佳实践
监控配置模板:
# 企业级监控配置
management:endpoints:web:exposure:include: health,metrics,info,prometheusbase-path: /internal/actuatorjmx:exposure:include: '*'metrics:export:prometheus:enabled: truedatadog:enabled: falsetags:application: ${spring.application.name}environment: ${spring.profiles.active}cluster: ${app.cluster.name:default}endpoint:health:show-details: when_authorizedshow-components: when_authorizedgroup:liveness:include: ping,diskSpacereadiness:include: database,redis,externalService
