当前位置: 首页 > news >正文

SigNoz 外置 ClickHouse 高可用部署实践

文章目录

    • 概述
    • 架构设计
    • 部署
      • Clickhouse
        • 集群部署
        • 相关库初始化
      • Signoz部署
        • Collector流程
          • 步骤 1:OTLP → Kafka
          • 步骤 2:Kafka → ClickHouse
          • 采样逻辑
    • 业务接入

概述

  • 目的:提升 SigNoz Trace/Logs/Metric 存储的可靠性与可用性
  • 背景:默认 SigNoz 内置 ClickHouse 对高可用支持有限,生产环境需要外置 HA 集群

架构设计

  • SigNoz Collector / Query / UI外置 ClickHouse 分离
  • Kafka 作为 Collector 与外置 ClickHouse 之间的中间队列,解耦采集与存储,缓解高并发写入压力。

部署

Clickhouse

集群部署

文档地址

相关库初始化
  • schema-migrator-sync 会检查 ClickHouse 是否存在 SigNoz 所需的表,如果不存在就 自动创建
  • 这个过程保证 Collector、Query、Dashboard 所依赖的表结构是完整且可用的。

后续升级或变更表结构时,也会通过 sync 或 async 来执行 ALTER 等操作。

version: "3.9"services:schema-migrator-sync:image: signoz/signoz-schema-migrator:v0.111.41container_name: schema-migrator-synccommand:- sync- --dsn=tcp://default:TrRkZeK1KJsOzfdxVFRLk7oy7hJx@192.168.100.10:9920- --up= - --cluster-name=clickhouse_cluster    ##填写正确的集群名称restart: on-failureschema-migrator-async:image: signoz/signoz-schema-migrator:v0.111.41container_name: schema-migrator-asynccommand:- async- --dsn=tcp://default:TrRkZeK1KJsOzfdxVFRLk7oy7hJx@192.168.100.10:9920- --up=- --cluster-name=clickhouse_clusterdepends_on:schema-migrator-sync:condition: service_completed_successfullyrestart: on-failure

⚡️: schema-migrator-sync填写任意一个ck节点即可,它会自动识别整个集群,然后在 所有节点上同步创建表

Signoz部署

# 使用 Helm Chart安装应用
~ helm repo add signoz https://charts.signoz.io
~ helm pull signoz/signoz
~ tar -zcvf signoz-${version}.tgz
~ vim values.yaml
#关闭CK创建
clickhouse:enabled: falsezookeeper:enabled: false#连接外置CK地址
externalClickhouse:host: 192.168.200.10   #最好填写TVS地址cluster: clickhouse_clusterdatabase: signoz_metricstraceDatabase: signoz_traceslogDatabase: signoz_logsuser: "default"password: ""   existingSecret:existingSecretPasswordKey:secure: falseverify: falsehttpPort: 8323tcpPort: 9922
Collector流程

⚡️: 启动两个Collector分步骤

在这里插入图片描述

步骤 1:OTLP → Kafka
  • 作用:把应用(微服务/agent)发送的 trace、metric、log 数据先收集到 Kafka 中。
  • 流程
    1. 应用通过 OTLP(gRPC 或 HTTP)发送数据到 Collector。
    2. Collector 进行必要的 processor 处理(如 batch、tail_sampling)。
    3. 经过 exporter 把数据写入 Kafka。
  • 优点
    • 数据不会直接写 CK,减轻压力。
    • 可以做异步缓冲、临时存储、消息重试。
    • 支持大规模高并发采集。
  • 配置
    • 自行更新 ConfigMap 和 服务启动命令 - “–config=/conf/otel-collector-config-otok.yaml”
## otel-collector-config-otok.yaml
exporters:clickhouselogsexporter:dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_LOG_DATABASE}timeout: 10suse_new_schema: trueclickhousetraces:datasource: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_TRACE_DATABASE}low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}use_new_schema: truemetadataexporter:cache:provider: in_memorydsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/signoz_metadatatenant_id: ${env:TENANT_ID}timeout: 10ssignozclickhousemetrics:dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_DATABASE}timeout: 45skafka/traces:brokers:- 192.168.100.10:9092- 192.168.100.20:9092- 192.168.100.30:9092topic: otel_tracesprotocol_version: 2.0.0#encoding: otlp_jsonproducer:max_message_bytes: 10485760 kafka/logs:brokers:- 192.168.100.10:9092- 192.168.100.20:9092- 192.168.100.30:9092topic: otel_logsprotocol_version: 2.0.0#encoding: otlp_jsonproducer:max_message_bytes: 10485760kafka/metrics:brokers:- 192.168.100.10:9092- 192.168.100.20:9092- 192.168.100.30:9092topic: otel_metricsprotocol_version: 2.0.0#encoding: otlp_jsonproducer:max_message_bytes: 10485760  
extensions:health_check:endpoint: 0.0.0.0:13133pprof:endpoint: localhost:1777zpages:endpoint: localhost:55679
processors:batch:send_batch_size: 50000timeout: 1stail_sampling:decision_wait: 2spolicies:- name: error-tracestype: status_codestatus_code:status_codes: [ERROR]- name: success-tracestype: probabilisticprobabilistic:sampling_percentage: 10- name: long-tracestype: latencylatency:threshold_ms: 2000000signozspanmetrics/delta:aggregation_temporality: AGGREGATION_TEMPORALITY_DELTAdimensions:- default: defaultname: service.namespace- default: defaultname: deployment.environment- name: signoz.collector.iddimensions_cache_size: 100000latency_histogram_buckets:- 100us- 1ms- 2ms- 6ms- 10ms- 50ms- 100ms- 250ms- 500ms- 1000ms- 1400ms- 2000ms- 5s- 10s- 20s- 40s- 60smetrics_exporter: signozclickhousemetrics
receivers:httplogreceiver/heroku:endpoint: 0.0.0.0:8081source: herokuhttplogreceiver/json:endpoint: 0.0.0.0:8082source: jsonjaeger:protocols:grpc:endpoint: 0.0.0.0:14250thrift_http:endpoint: 0.0.0.0:14268otlp:protocols:grpc:endpoint: 0.0.0.0:4317max_recv_msg_size_mib: 16http:endpoint: 0.0.0.0:4318kafka:brokers:- 192.168.100.10:9092- 192.168.100.20:9092- 192.168.100.30:9092topic: otelgroup_id: otel_groupprotocol_version: 2.0.0
service:extensions:- health_check- zpages- pprofpipelines:logs:exporters:- kafka/logs- metadataexporterprocessors:- batchreceivers:- otlp- httplogreceiver/heroku- httplogreceiver/jsonmetrics:exporters:- metadataexporter- kafka/metricsprocessors:- batchreceivers:- otlptraces:exporters:- kafka/traces- metadataexporterprocessors:- batch- tail_samplingreceivers:- otlp- jaegertelemetry:logs:encoding: json

👆: 协议不同 + 消费处理逻辑不同,所以建议 OTLP 数据写 Kafka 时按 signal 分 topic:otel_tracesotel_metricsotel_logs。 避免 proto 类型不匹配导致的 illegal wireType 错误


步骤 2:Kafka → ClickHouse
  • 作用:从 Kafka 中消费数据,然后写入 ClickHouse。
  • 流程
    1. Collector 作为 Kafka consumer 消费 topic。
    2. 可以做进一步处理(比如 batch、metrics 聚合、span metrics)。
    3. 最后通过 ClickHouse exporter 写入对应表(traces/logs/metrics)。
  • 注意事项
    • 消费时需要指定 encoding,保证 Kafka 写入和读取格式一致。
    • 可在这个阶段做一些额外筛选/加工(比如聚合、drop 不需要的数据)。
  • 配置
    • 自行更新 ConfigMap 和 服务启动命令 - “–config=/conf/otel-collector-config-ktoc.yaml”
## otel-collector-config-ktoc.yaml
exporters:clickhouselogsexporter:dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_LOG_DATABASE}timeout: 10suse_new_schema: trueclickhousetraces:datasource: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_TRACE_DATABASE}low_cardinal_exception_grouping: ${env:LOW_CARDINAL_EXCEPTION_GROUPING}use_new_schema: truemetadataexporter:cache:provider: in_memorydsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/signoz_metadatatenant_id: ${env:TENANT_ID}timeout: 10ssignozclickhousemetrics:dsn: tcp://${env:CLICKHOUSE_USER}:${env:CLICKHOUSE_PASSWORD}@${env:CLICKHOUSE_HOST}:${env:CLICKHOUSE_PORT}/${env:CLICKHOUSE_DATABASE}timeout: 45s   
extensions:health_check:endpoint: 0.0.0.0:13133pprof:endpoint: localhost:1777zpages:endpoint: localhost:55679
processors:batch:send_batch_size: 50000timeout: 1ssignozspanmetrics/delta:aggregation_temporality: AGGREGATION_TEMPORALITY_DELTAdimensions:- default: defaultname: service.namespace- default: defaultname: deployment.environment- name: signoz.collector.iddimensions_cache_size: 100000latency_histogram_buckets:- 100us- 1ms- 2ms- 6ms- 10ms- 50ms- 100ms- 250ms- 500ms- 1000ms- 1400ms- 2000ms- 5s- 10s- 20s- 40s- 60smetrics_exporter: signozclickhousemetrics
receivers:httplogreceiver/heroku:endpoint: 0.0.0.0:8081source: herokuhttplogreceiver/json:endpoint: 0.0.0.0:8082source: jsonjaeger:protocols:grpc:endpoint: 0.0.0.0:14250thrift_http:endpoint: 0.0.0.0:14268otlp:protocols:grpc:endpoint: 0.0.0.0:4317max_recv_msg_size_mib: 16http:endpoint: 0.0.0.0:4318kafka/logs:brokers:- 192.168.100.10:9092- 192.168.100.20:9092- 192.168.100.30:9092topic: otel_logsgroup_id: otel_logs_groupprotocol_version: 2.0.0encoding: otlp_protokafka/traces:brokers:- 192.168.100.10:9092- 192.168.100.20:9092- 192.168.100.30:9092topic: otel_tracesgroup_id: otel_traces_groupprotocol_version: 2.0.0encoding: otlp_protokafka/metrics:brokers:- 192.168.100.10:9092- 192.168.100.20:9092- 192.168.100.30:9092topic: otel_metricsgroup_id: otel_metrics_groupprotocol_version: 2.0.0encoding: otlp_proto
service:extensions:- health_check- zpages- pprofpipelines:logs:exporters:- clickhouselogsexporter- metadataexporterprocessors:- batchreceivers:- kafka/logs- httplogreceiver/heroku- httplogreceiver/jsonmetrics:exporters:- metadataexporter- signozclickhousemetricsprocessors:- batchreceivers:- kafka/metricstraces:exporters:- clickhousetraces- metadataexporterprocessors:- signozspanmetrics/delta- batchreceivers:- kafka/traces- jaegertelemetry:logs:encoding: json
采样逻辑
  • tail_sampling(错误全采,正常按比例采样)应该放在 OTLP → Kafka 这层,这样写入 Kafka 的数据已经是你想要的量。
  • Kafka → CK 主要做消费、写表和一些轻量处理,不再做采样。
# 对应配置
processors:batch:send_batch_size: 50000timeout: 1stail_sampling:decision_wait: 2spolicies:- name: error-tracestype: status_codestatus_code:status_codes: [ERROR]- name: success-tracestype: probabilisticprobabilistic:sampling_percentage: 10- name: long-tracestype: latencylatency:threshold_ms: 2000000

业务接入

运维处理完成后,业务方根据情况接入,支持多 receivers

http://www.dtcms.com/a/342007.html

相关文章:

  • Qt二维码生成器项目开发教程 - 从零开始构建专业级QR码生成工具
  • AI + 云原生 + ITSM 的三重融合:企业数字化转型的新引擎
  • Azure官网为何没直接体现专业服务
  • unity之物体旋转
  • 使用 queryParameters:参数,拦截到所有mars3d的网络请求
  • PPIO首发上线DeepSeek-V3.1,Agent 能力大幅提升
  • 驱动-在自定义总线上创建驱动-分析驱动注册流程
  • Linux笔记---策略模式与日志
  • Neovim clangd LSP 配置出现 “attempt to call field ‘ge‘”
  • [论文阅读] 人工智能 + 软件工程 | 当AI成为文学研究员:Agentic DraCor如何用MCP解锁戏剧数据分析
  • 短视频矩阵管理软件推荐:小麦矩阵系统全面解析
  • AR技术:重塑汽车制造的未来
  • B站视频字幕提取-为学习所用
  • java中ReentrantLock使用公平锁相关问题
  • 河南萌新联赛2025第(六)场:郑州大学补题
  • 分享一个基于Python与spark大数据的护肤品市场用户行为分析与可视化平台,基于hadoop的护肤品使用行为追踪与分析可视化平台的设计与实现
  • uniapp vue3 ts自定义底部 tabbar菜单
  • FPGA DP1.4 With DSC解决方案
  • IE启动时主页被360守护了想变回去怎么办?
  • Spring Boot + Spring AI 最小可运行 Demo
  • TensorFlow深度学习实战(33)——深度确定性策略梯度
  • 【知识储备】PyTorch / TensorFlow 和张量的联系
  • 29、工业网络威胁检测与响应 (IDS 模拟) - /安全与维护组件/industrial-network-ids
  • C# OpencvSharp获取Astra Pro奥比中光深度相机深度图
  • AXI GPIO S——ZYNQ学习笔记10
  • 基于OpenCV的物体识别与计数
  • 基于SpringBoot的流浪动物领养管理系统【2026最新】
  • 【Android】悬浮窗清理
  • 政务分建用户体系下基于OAuth2.0概念单点登录实现方案
  • CT02-20.有效的括号(Java)