自定义Grafana错误率面板No Data问题排查
目录
- 一、现象
- 二、问题分析
- 三、解决方案
一、现象
最近基于Grafana新建了一个Http请求错误率展示面板,具体的PromQL语句如下:
sum(rate(http_server_request_duration_seconds_count{exported_job="${job}", http_route="${http_route}",
http_request_method="${http_method}", http_response_status_code=~"4..|5.."}[$__rate_interval]))
by (exported_job, http_route, http_request_method)
/
sum(rate(http_server_request_duration_seconds_count{exported_job="${job}", http_route="${http_route}",
http_request_method="${http_method}"}[$__rate_interval]))
by (exported_job, http_route, http_request_method)
即按照exported_job, http_route, http_request_method
维度统计http status code为4xx、5xx请求
所占的百分比。
以如下3个指标为例:
- http_server_request_duration_seconds_count (exported_job=
"app-aggr"
, http_route=“/api/v1/sample/list”,
http_request_method=“GET”, http_response_status_code="200"
) - http_server_request_duration_seconds_count (exported_job=
"app-aggr"
, http_route=“/api/v1/sample/list”,
http_request_method=“GET”, http_response_status_code="400"
) - http_server_request_duration_seconds_count (exported_job=
"app-atom"
, http_route=“/api/v1/sample/{id}|”,
http_request_method=“GET”, http_response_status_code="200"
)
在统计app-aggr
应用的/api/v1/sample/list
Http请求错误率时,可以正确显示错误率百分比:
在统计app-atom
应用的/api/v1/sample/{id}
Http请求错误率时,却提示No Data
:
二、问题分析
实际通过promethues查询指标:
http_server_request_duration_seconds_count (
exported_job="app-atom"
,
http_route="/api/v1/sample/{id}|"
,
http_request_method="GET"
)
可以发现app-atom
应用的/api/v1/sample/{id}
请求没有http_response_status_code
为4xx
或5xx
的指标记录,仅有http_response_status_code
为200
的指标记录。Prometheus 查询分子没有数据时,Grafana 面板会显示 No data
而不是显示为 0
,而我的需求是在分子没有数据时要让面板显示为 0
。
三、解决方案
最终调整为如下PromQL语句:
(sum(rate(http_server_request_duration_seconds_count{exported_job="${job}", http_route="${http_route}", http_request_method="${http_method}", http_response_status_code=~"4..|5.."}[$__rate_interval])) by (exported_job, http_route, http_request_method)or sum(rate(http_server_request_duration_seconds_count{exported_job="${job}", http_route="${http_route}", http_request_method="${http_method}"}[$__rate_interval])) by (exported_job, http_route, http_request_method) * 0
)
/
sum(rate(http_server_request_duration_seconds_count{exported_job="${job}", http_route="${http_route}",
http_request_method="${http_method}"}[$__rate_interval]))
by (exported_job, http_route, http_request_method)
即在分子处通过or
组合分母 * 0
,即解决了分子不存在时替换为0,又能和分母对齐标签,满足了分子不存在时的Http错误率显示为0
的需求。
调整后再次查询app-atom
应用的/api/v1/sample/{id}
Http请求错误率时显示效果如下图,此时的错误率均为0
: