当前位置: 首页 > wzjs >正文

穿着西裤做的网站广告策划公司

穿着西裤做的网站,广告策划公司,建设大型网站设计公司,asp网站做搜索kafka records deletion policy Time-based retention 就是保存partition records 一定时间后删除。 这个是默认的策略,需要配置 cleanup.policydelete 和 retention.ms。 retention.ms 的默认值是 604800000 ms (7 days)。 这些是默认设置。 By default, retent…

kafka records deletion policy

Time-based retention

就是保存partition records 一定时间后删除。
这个是默认的策略,需要配置 cleanup.policy=deleteretention.ms
retention.ms 的默认值是 604800000 ms (7 days)。 这些是默认设置。

By default, retention is not based on message timestamp 除非明确设置为CreateTime, kafka uses the log append time.

CreateTime: The timestamp set by the producer when it creates the message.Embedded in the message when sent to Kafka.

LogAppendTime: The timestamp is set by the Kafka broker at the moment the message is written to the log.

Size-based retention

就是根据partition 消耗的空间来保存records,可以通过配置 retention.bytes 来配置。

Log compaction

Compaction is a key-based retention mechanism. The goal of compaction is to keep the most recent value for a given key. However, historical data will be lost, so it may not always be the best choice.

Compaction also provides a way to completely remove a key, by appending an event with that key and a null value. If this null value, also known as a tombstone, is the most recent value for that key, then it will be marked for deletion along with any older occurrences of that key. This could be necessary for things like GDPR compliance.

Enable 这个feature 需要设置 cleanup.policy=compact 然后配置其他参数控制什么时候触发.
下面其中任一条件满足, 一个topic partition 就会触发 compaction.

1. dirty/ total > min.cleanable.dirty.ratio and
message timestamp < current time - min.compaction.lag.ms2. message timestamp > current time - min.compaction.lag.ms

Normally, compaction will be triggered when the ratio of dirty data to total data reaches the threshold set by the min.cleanable.dirty.ratio configuration, which defaults to 50 percent. However, there are a couple of other configurations that can affect this decision.

To prevent overly aggressive cleaning, we can set the value of min.compaction.lag.ms. With this set, compaction won’t start more frequently than its value even if the dirty/total ratio has been exceeded. This provides a minimum period of time for applications to see an event prior to its deletion.

On the other end of the spectrum, we can set max.compaction.lag.ms, which will trigger a compaction once the time limit has been reached, regardless of the amount of dirty data.

理解compaction 要了解 kafka partition 中的 dirty segmentcleaned segmentactive segment .

这是个非常关键的问题,理解 Kafka 中的 dirty segmentcleaned segmentactive segment 有助于你深入掌握 Kafka 的日志管理和压缩机制。

Kafka 中的三种 Segment 状态概念

Kafka 中每个 分区 是由多个 log segment 文件 组成的,按时间或大小滚动创建。Kafka 对 segment 有不同的角色分类:


✅ 1. Active Segment(活跃段)
  • 当前正在被写入的 segment 文件。
  • 始终是 最新的一个 segment
  • 永远不会被压缩或删除(直到被“滚动”)。
  • 文件名示例:00000000000000000000.log

📌 特征:

  • 只写不压缩。
  • 随着写入增长,超过 segment.mssegment.bytes 后会被关闭,变成“非活跃段”。

✅ 2. Dirty Segment(脏段)
  • 指的是自上次压缩以后还没有被清理过的 segment。
  • 这些段中可能包含重复 key、已删除 key 的消息。
  • Kafka 将这些段视为 “需要压缩的候选段”

📌 判断方式:

  • Segment 是否包含在 logEndOffsetlastCleanOffset 之间。
  • 只要 segment 中的数据 尚未经过 log compaction,就是 dirty 的。

✅ 3. Cleaned Segment(已压缩段)
  • 已被 LogCleaner 线程处理过的 segment。
  • 每个 key 只保留了最新的 value(或 tombstone,如果删除未过期)。
  • 清理后 Kafka 会生成新的 segment,替换旧的 dirty segment。

📌 特征:

  • 被视为“干净”的,无需再次压缩。
  • 压缩后写入新的 .cleaned 文件(内部临时文件),最终替换原始 segment。

🧠 三者的关系图示(简化)
[segment-0]  ← Cleaned Segment
[segment-1]  ← Cleaned Segment
[segment-2]  ← Dirty Segment
[segment-3]  ← Dirty Segment
[segment-4]  ← Active Segment
  • segment-0/1:已经被压缩,Kafka 只保留了每个 key 的最新值。
  • segment-2/3:等待压缩,LogCleaner 会根据 dirty ratio 进行处理。
  • segment-4:正在写入,尚未封存,永远不会被压缩。

🔍 如何查看当前 segment 状态?

Kafka 没有直接命令输出这些状态,但你可以通过查看 Kafka 日志文件和运行时指标判断:

  • 查看 broker 数据目录(如 /kafka-logs/your-topic-0/),文件以 offset 命名。

  • 监控指标中有:

    • log-cleaner-cleanerManager-cleaner-0 线程的运行状态
    • kafka.log:type=LogCleaner,name=MaxBufferUtilization
    • kafka.log:type=Log,name=LogEndOffset

🧠 总结对比表
名称是否写入?是否压缩?说明
Active Segment✅ 是❌ 否当前活跃写入的 segment
Dirty Segment❌ 否✅ 是可压缩的旧 segment
Cleaned Segment❌ 否❌ 否已压缩过的数据段

kafka records deletion 注意点

1. Data Is Deleted Even If Not Consumed

  • Kafka does not track consumer state when deleting messages.
  • If a consumer is slow or offline, it may miss messages if they are deleted before being read.

Solution:

  • Ensure consumers keep up.
  • Use longer retention.ms if data must remain available longer.
  • Consider log compaction if you need retention per key.

2. Retention Applies at the Partition Level

  • retention.ms and retention.bytes apply per partition, not per topic.
  • This can lead to imbalanced data storage if partition sizes differ.

Solution:

  • Use good partitioning logic to distribute data evenly.

3. Segments Are Only Deleted After Being Fully Aged

  • Kafka deletes entire log segments, not individual records.
  • If a segment contains even one unexpired message, the whole segment stays.

Solution:

  • Tune segment.ms or segment.bytes for faster cleanup granularity.

4. Timestamps Must Be Handled Properly

Kafka determines “message age” using:

  • Log append time (default), or
  • Create time (if explicitly configured).

Misconfigured timestamp settings can lead to:

  • Messages being deleted too early or not at all.

Solution:

log.message.timestamp.type=CreateTime

Use this only if producers set timestamps reliably.


5. Disk Space Pressure Is Still Possible

  • If producers publish too much data within the retention window, Kafka may run out of disk space before cleanup triggers.

Solution:

  • Set retention.bytes in addition to retention.ms for dual control.
  • Monitor broker disk usage carefully.

6. No Per-Consumer Retention

  • Unlike traditional queues, Kafka cannot retain data per consumer.

Solution:
Use tools like Kafka Connect, Kafka Streams, or external state stores if per-consumer retention is needed.


7. Retention Delay

  • The deletion process is handled by Kafka’s log cleaner thread, which runs periodically.
  • There may be a delay after retention.ms is reached before data is actually deleted.

Solution:
Expect a slight lag in cleanup; don’t rely on exact timing.

http://www.dtcms.com/wzjs/222777.html

相关文章:

  • 哈尔滨网站设计模板超级优化
  • seo推广优化公司seo公司哪家好
  • 武汉微网站微官网太原seo网络优化招聘网
  • 建设网站和备案seo优化公司如何做
  • 做咖啡网站网络平台怎么创建
  • 网站是怎么赢利的网络营销服务的特点有哪些
  • 毕设做网站和app安卓神级系统优化工具
  • 企业管理培训视频免费广州:推动优化防控措施落
  • 域名查询138ip辽阳网站seo
  • 阿里云个人网站备案seo站外优化平台
  • 3d网站建设网站站点
  • 聚美优品网站建设的目标郑州网站seo公司
  • 网站建设制作设计营销公司四川海南seo快速排名优化多少钱
  • 网站做报表百度官方认证
  • 网站设计宽屏广州seo运营
  • 仿站在线工具常见的网站推广方法
  • 龙岗附近网站开发公司seo专员工作内容
  • 建设官方网站首页南京百度seo公司
  • 网站分享链接怎么做小学生摘抄新闻
  • 做公司网站页面邯郸百度推广公司
  • 营销网站建设设计国内新闻热点事件
  • 今天体育新闻厦门seo关键词优化代运营
  • domain 网站建设重庆seo推广运营
  • 个人网站的开发与设计app推广方案范例
  • dw做网站可以做毕业设计吗网络推广应该怎么做啊
  • 绵阳公司网站建设百度后台登陆入口
  • 汽车网站开发背景武汉seo优化代理
  • 建设网站的公司哪家好今日国际新闻摘抄十条
  • 视频网站建设 可行性报告seo的作用是什么
  • 怎么自己制作网站平台互联网推广是做什么的