GitLab社区版日志rotate失败的问题
背景:
某天偶然观察到公司部署的GitLab 日志所在的文件夹占用比例达到了93%,不太正常。
于是进入容器查看,发现主要是gitlab-rails的一个日志文件 /var/log/gitlab/gitlab-rails/application_json.log太大了,达到了110G左右

查看logrotate配置,配置是正常的,30日轮转
root@git:/var/log/gitlab/gitlab-rails# cat /var/opt/gitlab/logrotate/logrotate.d/gitlab-rails
# Generated by gitlab-ctl reconfigure
# Modifications will be overwritten!/var/log/gitlab/gitlab-rails/*.log {su git gitdailyrotate 30compresscopytruncatemissingoknotifemptypostrotateendscript
}
查看logrotate.status看看有没有正常触发:
root@exgit-uat:/usr/sbin# cat /var/opt/gitlab/logrotate/logrotate.status
logrotate state -- version 2
"/var/log/gitlab/nginx/gitlab_error.log" 2025-10-15-6:0:0
"/var/log/gitlab/puma/puma_stderr.log" 2025-10-18-0:33:40
"/var/log/gitlab/gitlab-rails/auth.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitaly/gitaly_hooks.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-rails/database_load_balancing.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitlab-rails/service_measurement.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-rails/git_json.log" 2025-10-15-6:0:0
"/var/log/gitlab/nginx/error.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-rails/graphql_json.log" 2025-10-17-3:33:40
"/var/log/gitlab/gitlab-rails/production_json.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-2025-09-12-02-32-53.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-workhorse/*.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-rails/backup_json.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitlab-rails/api_json.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-2025-10-15-06-22-25.log" 2025-10-16-0:33:27
"/var/log/gitlab/gitlab-rails/grpc.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-rails/exceptions_json.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitlab-rails/sidekiq_client.log" 2025-10-17-3:33:40
"/var/log/gitlab/nginx/gitlab_access.log" 2025-10-21-0:33:43
"/var/log/gitlab/puma/puma_stdout.log" 2025-10-17-0:33:29
"/var/log/gitlab/gitlab-rails/application_json.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitlab-pages/*.log" 2025-10-15-6:0:0
"/var/log/gitlab/mailroom/*.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-kas/*.log" 2025-10-15-6:0:0
"/var/log/gitlab/nginx/access.log" 2025-10-15-6:0:0
"/var/log/gitlab/gitlab-rails/audit_json.log" 2025-10-17-3:33:40
"/var/log/gitlab/gitlab-rails/production.log" 2025-10-21-0:33:43
"/var/log/gitlab/gitlab-shell/*.log" 2025-10-15-6:0:0
其中"/var/log/gitlab/gitlab-rails/production_json.log" 2025-10-21-0:33:43说明最近都是有正常触发rotate
那么猜测是rotate虽然正常触发了,但是应该是某些原因失败了,结合最开始的日志截图看

10月9日之后,logrotate 虽然运行了,但 copytruncate 失败
GitLab 一直在往 application_json.log 追加写入,导致文件越来越大(现在 110GB)
logrotate 无法完成 copytruncate(可能是复制110GB太慢/超时/磁盘满)
所以 没有生成新的 .1 文件
旧的 .1 到 .14 在后续轮转中被“推高编号”(因为 rotate 30),但是新的压缩文件又没有生成,导致.1 到 .14缺失。
解决方案:
因为是生产环境,因此尽量不影响GitLab的正常使用。
使用 echo > application_json.log覆盖日志文件的内容。
期间一切正常,没有影响生产。
次日观察log rotate也已经恢复正常:

总结:GitLab的社区版还是稳定性不够好,要定时观察一下各种组件和功能的状态。
