记一次loki报错的处理过程
记一次loki报错的处理过程
level=info ts=2025-03-27T10:33:15.228302842Z caller=main.go:103 msg="Starting Loki" version="(version=2.7.1, branch=HEAD, revision=e0af1cc8a)"
level=info ts=2025-03-27T10:33:15.229214772Z caller=server.go:323 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2025-03-27T10:33:15.229458531Z caller=modules.go:862 msg="Ruler storage is not configured; ruler will not be started."
level=warn ts=2025-03-27T10:33:15.230467942Z caller=cache.go:114 msg="fifocache config is deprecated. use embedded-cache instead"
level=warn ts=2025-03-27T10:33:15.230482527Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache - chunksembedded-cache"
level=info ts=2025-03-27T10:33:15.231301645Z caller=table_manager.go:134 msg="uploading tables"
level=info ts=2025-03-27T10:33:15.23188939Z caller=table_manager.go:404 msg="loading local table index_20174"
ts=2025-03-27T10:33:15.232543073Z caller=spanlogger.go:80 level=info msg="building index list cache"
ts=2025-03-27T10:33:15.232998748Z caller=spanlogger.go:80 level=info msg="index list cache built" duration=445.118µs
level=error ts=2025-03-27T10:33:15.233955443Z caller=index_set.go:132 table-name=index_20174 msg="failed to open existing index file /data/loki/boltdb-shipper-cache/index_20174/loki-0-1743065840775108767-1743067649, removing the file and continuing without it to let the sync operation catch up" err="recovered from panic opening boltdb file: invalid freelist page: 0, page type is unknown<00>"
level=error ts=2025-03-27T10:33:15.236759776Z caller=index_set.go:132 table-name=index_20174 msg="failed to open existing index file /data/loki/boltdb-shipper-cache/index_20174/loki-0-1743065840775108767-1743067800, removing the file and continuing without it to let the sync operation catch up" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
level=error ts=2025-03-27T10:33:15.239314264Z caller=index_set.go:132 table-name=index_20174 msg="failed to open existing index file /data/loki/boltdb-shipper-cache/index_20174/loki-0-1743065840775108767-1743068700, removing the file and continuing without it to let the sync operation catch up" err="recovered from panic opening boltdb file: invalid freelist page: 0, page type is meta"
level=error ts=2025-03-27T10:33:15.241276984Z caller=index_set.go:132 table-name=index_20174 msg="failed to open existing index file /data/loki/boltdb-shipper-cache/index_20174/loki-0-1743065840775108767-1743069600, removing the file and continuing without it to let the sync operation catch up" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
level=info ts=2025-03-27T10:33:15.244251769Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743067800.gz msg="downloaded file" total_time=1.572121ms
level=info ts=2025-03-27T10:33:15.246136082Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743068700.gz msg="downloaded file" total_time=3.457476ms
level=info ts=2025-03-27T10:33:15.247081561Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743069600.gz msg="downloaded file" total_time=4.39162ms
level=info ts=2025-03-27T10:33:15.251176009Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743067649.gz msg="downloaded file" total_time=8.494152ms
level=error ts=2025-03-27T10:33:15.253878836Z caller=index_set.go:285 table-name=index_20174 msg="sync failed, retrying it" err="recovered from panic opening boltdb file: runtime error: invalid memory address or nil pointer dereference"
level=info ts=2025-03-27T10:33:15.257974072Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743067800.gz msg="downloaded file" total_time=3.977508ms
level=info ts=2025-03-27T10:33:15.258336915Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743068700.gz msg="downloaded file" total_time=4.310542ms
level=info ts=2025-03-27T10:33:15.258448963Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743067649.gz msg="downloaded file" total_time=4.485528ms
level=info ts=2025-03-27T10:33:15.258537814Z caller=util.go:85 table-name=index_20174 file-name=loki-0-1743065840775108767-1743069600.gz msg="downloaded file" total_time=4.231136ms
timeout
error creating index client
github.com/grafana/loki/pkg/storage.(*store).storeForPeriod
/src/loki/pkg/storage/store.go:270
github.com/grafana/loki/pkg/storage.(*store).init
/src/loki/pkg/storage/store.go:164
github.com/grafana/loki/pkg/storage.NewStore
/src/loki/pkg/storage/store.go:147
github.com/grafana/loki/pkg/loki.(*Loki).initStore
/src/loki/pkg/loki/modules.go:632
github.com/grafana/dskit/modules.(*Manager).initModule
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92
github.com/grafana/loki/pkg/loki.(*Loki).Run
/src/loki/pkg/loki/loki.go:363
main.main
/src/loki/cmd/loki/main.go:105
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1594
error initialising module: store
github.com/grafana/dskit/modules.(*Manager).initModule
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:122
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92
github.com/grafana/loki/pkg/loki.(*Loki).Run
/src/loki/pkg/loki/loki.go:363
main.main
/src/loki/cmd/loki/main.go:105
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1594
分析原因
Loki 在初始化存储(store 模块)时读取 boltdb-shipper 缓存的索引文件出现了问题,错误信息中包括:
“invalid freelist page: 0, page type is unknown”
“runtime error: invalid memory address or nil pointer dereference”
这表明 Loki 尝试打开位于 /data/loki/boltdb-shipper-cache/index_20174/ 下的索引文件时,检测到文件内部数据结构异常或损坏(例如空闲列表页无效或文件结构不完整)。Loki 在检测到问题后,尝试删除这些损坏文件并进行重新同步,但最终导致 index client 创建失败,从而使 store 模块初始化失败。
可能原因及解决方案:
1)文件损坏
可能由于非正常关机、存储介质问题或不兼容的操作导致 boltdb-shipper 缓存中的索引文件损坏。
解决方案:可以尝试删除或清空 /data/loki/boltdb-shipper-cache/ 目录(最好先备份),让 Loki 在下次启动时重新构建索引文件。
2)存储卷问题
检查挂载到 /data/loki 的存储卷是否健康、磁盘是否有问题以及权限设置是否正确。
确保使用持久化卷(PV/PVC)时,存储性能和 I/O 稳定性满足 Loki 的要求。
版本或配置问题
3)解决方案
检查当前使用的 Loki 版本是否存在已知的 boltdb-shipper 缓存相关问题,考虑升级到较新版本。
同时核对 Loki 配置中关于 boltdb-shipper 的设置,确保配置正确。
综上所述,建议先备份并清除 /data/loki/boltdb-shipper-cache 目录,重启 Loki 观察是否能正常构建索引;同时检查底层存储的健康状态与配置,确保没有外部问题影响索引文件的生成。