Prometheus实战教程 02 - Prometheus 配置详解
目录
- Prometheus 配置详解
- 配置概述
- 命令行配置核心参数(Flags)
- 配置文件详解
- 官网案例配置
Prometheus 配置详解
配置概述
- Prometheus 的配置通过命令行标志和配置文件实现
- 命令行标志:配置不可变的系统参数(如存储位置、数据保留量等)
- 配置文件:定义与抓取相关的所有内容(作业和实例)以及要加载的规则文件
- 配置可在运行时重新加载,通过发送
SIGHUP
信号或向/-/reload
端点发送HTTP POST请求(需启用--web.enable-lifecycle
标志) - 若新配置格式不正确,更改不会生效
命令行配置核心参数(Flags)
主要参数按功能分类如下:
-
基础配置参数
--config.file
:配置文件路径,默认prometheus.yml
--config.auto-reload-interval
:配置自动重载间隔,默认 30s--web.listen-address
:UI、API 监听地址,默认0.0.0.0:9090
--web.enable-lifecycle
:--version
:显示应用版本--h/--help
:显示帮助信息
-
资源设置参数
--auto-gomaxprocs
:自动设置 GOMAXPROCS 匹配 CPU 配额,默认 true--auto-gomemlimit
:自动设置 GOMEMLIMIT 匹配内存限制,默认 true--auto-gomemlimit.ratio
:内存限制比例,默认 0.9
-
存储相关参数
--storage.tsdb.path
:服务器模式存储路径,默认data/
--storage.tsdb.retention.time
:样本保留时间,默认 15 天(服务器模式)--storage.tsdb.retention.size
:存储块最大字节数(需带单位)--storage.agent.path
:代理模式存储路径,默认data-agent/
-
查询与规则参数
--query.timeout
:查询超时时间,默认 2 分钟(服务器模式)--query.max-concurrency
:最大并发查询数,默认 20(服务器模式)--rules.max-concurrent-evals
:规则并发评估上限,默认 4(服务器模式)--rules.alert.resend-delay
:告警重发最小延迟,默认 1 分钟(服务器模式)
-
日志设置参数
--log.level
:日志级别,默认info
(可选 debug、warn、error)--log.format
:日志格式,默认logfmt
(可选 json)
-
其他重要参数
--enable-feature
:启用的特性列表(如 native-histograms 等)--agent
:以代理模式运行 Prometheus--alertmanager.notification-queue-capacity
:告警通知队列容量,默认 10000(服务器模式)
配置文件详解
- 使用
--config.file
标志指定要加载的配置文件,文件采用YAML格式 - 包含多种配置参数类型及占位符定义(如
<boolean>
、<duration>
、<host>
等) - 主要配置部分包括:
- global:全局配置,为其他配置部分提供默认值,包括抓取间隔、超时时间、规则评估间隔等
- runtime:配置Go垃圾收集器GOGC参数
- rule_files:指定规则和警报的文件路径列表
- scrape_config_files:指定抓取配置文件路径列表
- scrape_configs:抓取配置列表,定义目标和抓取参数
- alerting:与Alertmanager相关的设置
- remote_write:远程写入功能的设置
- otlp:OTLP接收器功能的设置
- remote_read:远程读取功能的设置
- storage:与存储相关的可重新加载设置
- tracing:配置跟踪导出
官网案例配置
# my global config
global:scrape_interval: 15sevaluation_interval: 30sbody_size_limit: 15MBsample_limit: 1500target_limit: 30label_limit: 30label_name_length_limit: 200label_value_length_limit: 200query_log_file: query.logscrape_failure_log_file: fail.log# scrape_timeout is set to the global default (10s).external_labels:monitor: codelabfoo: barruntime:gogc: 42rule_files:- "first.rules"- "my/*.rules"remote_write:- url: http://remote1/pushname: drop_expensivewrite_relabel_configs:- source_labels: [__name__]regex: expensive.*action: dropoauth2:client_id: "123"client_secret: "456"token_url: "http://remote1/auth"tls_config:cert_file: valid_cert_filekey_file: valid_key_file- url: http://remote2/pushprotobuf_message: io.prometheus.write.v2.Requestname: rw_tlstls_config:cert_file: valid_cert_filekey_file: valid_key_fileheaders:name: valueotlp:promote_resource_attributes: ["k8s.cluster.name", "k8s.job.name", "k8s.namespace.name"]remote_read:- url: http://remote1/readread_recent: truename: defaultenable_http2: false- url: http://remote3/readread_recent: falsename: read_specialrequired_matchers:job: specialtls_config:cert_file: valid_cert_filekey_file: valid_key_filescrape_configs:- job_name: prometheushonor_labels: true# scrape_interval is defined by the configured global (15s).# scrape_timeout is defined by the global default (10s).# metrics_path defaults to '/metrics'# scheme defaults to 'http'.fallback_scrape_protocol: PrometheusText0.0.4scrape_failure_log_file: fail_prom.logfile_sd_configs:- files:- foo/*.slow.json- foo/*.slow.yml- single/file.ymlrefresh_interval: 10m- files:- bar/*.yamlstatic_configs:- targets: ["localhost:9090", "localhost:9191"]labels:my: labelyour: labelhttp_headers:foo:values: ["foobar"]secrets: ["bar", "foo"]files: ["valid_password_file"]relabel_configs:- source_labels: [job, __meta_dns_name]regex: (.*)some-[regex]target_label: jobreplacement: foo-${1}# action defaults to 'replace'- source_labels: [abc]target_label: cde- replacement: statictarget_label: abc- regex:replacement: statictarget_label: abc- source_labels: [foo]target_label: abcaction: keepequal- source_labels: [foo]target_label: abcaction: dropequalauthorization:credentials_file: valid_token_filetls_config:min_version: TLS10- job_name: service-xbasic_auth:username: admin_namepassword: "multiline\nmysecret\ntest"scrape_interval: 50sscrape_timeout: 5sscrape_protocols: ["PrometheusText0.0.4"]body_size_limit: 10MBsample_limit: 1000target_limit: 35label_limit: 35label_name_length_limit: 210label_value_length_limit: 210metrics_path: /my_pathscheme: httpsdns_sd_configs:- refresh_interval: 15snames:- first.dns.address.domain.com- second.dns.address.domain.com- names:- first.dns.address.domain.comrelabel_configs:- source_labels: [job]regex: (.*)some-[regex]action: drop- source_labels: [__address__]modulus: 8target_label: __tmp_hashaction: hashmod- source_labels: [__tmp_hash]regex: 1action: keep- action: labelmapregex: 1- action: labeldropregex: d- action: labelkeepregex: kmetric_relabel_configs:- source_labels: [__name__]regex: expensive_metric.*action: drop- job_name: service-yconsul_sd_configs:- server: "localhost:1234"token: mysecretpath_prefix: /consulservices: ["nginx", "cache", "mysql"]tags: ["canary", "v1"]node_meta:rack: "123"allow_stale: truescheme: httpstls_config:ca_file: valid_ca_filecert_file: valid_cert_filekey_file: valid_key_fileinsecure_skip_verify: falserelabel_configs:- source_labels: [__meta_sd_consul_tags]separator: ","regex: label:([^=]+)=([^,]+)target_label: ${1}replacement: ${2}- job_name: service-ztls_config:cert_file: valid_cert_filekey_file: valid_key_fileauthorization:credentials: mysecret- job_name: service-kuberneteskubernetes_sd_configs:- role: endpointsapi_server: "https://localhost:1234"tls_config:cert_file: valid_cert_filekey_file: valid_key_filebasic_auth:username: "myusername"password: "mysecret"- job_name: service-kubernetes-namespaceskubernetes_sd_configs:- role: endpointsapi_server: "https://localhost:1234"namespaces:names:- defaultbasic_auth:username: "myusername"password_file: valid_password_file- job_name: service-kumakuma_sd_configs:- server: http://kuma-control-plane.kuma-system.svc:5676client_id: main-prometheus- job_name: service-marathonmarathon_sd_configs:- servers:- "https://marathon.example.com:443"auth_token: "mysecret"tls_config:cert_file: valid_cert_filekey_file: valid_key_file- job_name: service-nomadnomad_sd_configs:- server: 'http://localhost:4646'- job_name: service-ec2ec2_sd_configs:- region: us-east-1access_key: accesssecret_key: mysecretprofile: profilefilters:- name: tag:environmentvalues:- prod- name: tag:servicevalues:- web- db- job_name: service-lightsaillightsail_sd_configs:- region: us-east-1access_key: accesssecret_key: mysecretprofile: profile- job_name: service-azureazure_sd_configs:- environment: AzurePublicCloudauthentication_method: OAuthsubscription_id: 11AAAA11-A11A-111A-A111-1111A1111A11resource_group: my-resource-grouptenant_id: BBBB222B-B2B2-2B22-B222-2BB2222BB2B2client_id: 333333CC-3C33-3333-CCC3-33C3CCCCC33Cclient_secret: mysecretport: 9100- job_name: service-nervenerve_sd_configs:- servers:- localhostpaths:- /monitoring- job_name: 0123service-xxxmetrics_path: /metricsstatic_configs:- targets:- localhost:9090- job_name: badfederationhonor_timestamps: falsemetrics_path: /federatestatic_configs:- targets:- localhost:9090- job_name: 測試metrics_path: /metricsstatic_configs:- targets:- localhost:9090- job_name: httpsdhttp_sd_configs:- url: "http://example.com/prometheus"- job_name: service-tritontriton_sd_configs:- account: "testAccount"dns_suffix: "triton.example.com"endpoint: "triton.example.com"port: 9163refresh_interval: 1mversion: 1tls_config:cert_file: valid_cert_filekey_file: valid_key_file- job_name: digitalocean-dropletsdigitalocean_sd_configs:- authorization:credentials: abcdef- job_name: dockerdocker_sd_configs:- host: unix:///var/run/docker.sock- job_name: dockerswarmdockerswarm_sd_configs:- host: http://127.0.0.1:2375role: nodes- job_name: service-openstackopenstack_sd_configs:- role: instanceregion: RegionOneport: 80refresh_interval: 1mtls_config:ca_file: valid_ca_filecert_file: valid_cert_filekey_file: valid_key_file- job_name: service-puppetdbpuppetdb_sd_configs:- url: https://puppetserver/query: 'resources { type = "Package" and title = "httpd" }'include_parameters: trueport: 80refresh_interval: 1mtls_config:ca_file: valid_ca_filecert_file: valid_cert_filekey_file: valid_key_file- job_name: hetznerrelabel_configs:- action: uppercasesource_labels: [instance]target_label: instancehetzner_sd_configs:- role: hcloudauthorization:credentials: abcdef- role: robotbasic_auth:username: abcdefpassword: abcdef- job_name: service-eurekaeureka_sd_configs:- server: "http://eureka.example.com:8761/eureka"- job_name: ovhcloudovhcloud_sd_configs:- service: vpsendpoint: ovh-euapplication_key: testAppKeyapplication_secret: testAppSecretconsumer_key: testConsumerKeyrefresh_interval: 1m- service: dedicated_serverendpoint: ovh-euapplication_key: testAppKeyapplication_secret: testAppSecretconsumer_key: testConsumerKeyrefresh_interval: 1m- job_name: scalewayscaleway_sd_configs:- role: instanceproject_id: 11111111-1111-1111-1111-111111111112access_key: SCWXXXXXXXXXXXXXXXXXsecret_key: 11111111-1111-1111-1111-111111111111- role: baremetalproject_id: 11111111-1111-1111-1111-111111111112access_key: SCWXXXXXXXXXXXXXXXXXsecret_key: 11111111-1111-1111-1111-111111111111- job_name: linode-instanceslinode_sd_configs:- authorization:credentials: abcdef- job_name: stackit-serversstackit_sd_configs:- project: 11111111-1111-1111-1111-111111111111authorization:credentials: abcdef- job_name: uyuniuyuni_sd_configs:- server: https://localhost:1234username: gopherpassword: hole- job_name: ionosionos_sd_configs:- datacenter_id: 8feda53f-15f0-447f-badf-ebe32dad2fc0authorization:credentials: abcdef- job_name: vultrvultr_sd_configs:- authorization:credentials: abcdefalerting:alertmanagers:- scheme: httpsstatic_configs:- targets:- "1.2.3.4:9093"- "1.2.3.5:9093"- "1.2.3.6:9093"storage:tsdb:out_of_order_time_window: 30mtracing:endpoint: "localhost:4317"client_type: "grpc"headers:foo: "bar"timeout: 5scompression: "gzip"tls_config:cert_file: valid_cert_filekey_file: valid_key_fileinsecure_skip_verify: true