当前位置: 首页 > news >正文

Flink 状态和 CheckPoint 的区别和联系(附源码)

Flink 状态和checkpoint的区别和联系(附源码

  • 1. 本质区别:运行时 vs 持久化
    • 1.1 State(状态):运行时的"工作内存"
    • 1.2 Checkpoint:状态的"快照存档"
  • 2. 形象类比
  • 3. 源码层面的关系
    • 3.1 CheckpointableKeyedStateBackend:连接两者的桥梁
    • 3.2 StateHandle:Checkpoint 的元数据
    • 3.3 HeapKeyedStateBackend:实际的实现
  • 4. 完整的生命周期
    • 4.1 正常运行时
    • 4.2 Checkpoint 触发时
    • 4.3 故障恢复时
  • 5 关键区别对比表
  • 6. 源码中的协作机制
    • 6.1 Checkpoint 选项配置
    • 6.2不同类型的 Checkpoint
  • 7. 实战示例:完整流程
  • 8. 核心联系总结
    • 8.1 依赖关系
    • 8.2 协作关系
    • 8.3 性能权衡
    • 8.4 统一抽象
  • 9. 关键要点

1. 本质区别:运行时 vs 持久化

1.1 State(状态):运行时的"工作内存"

package org.apache.flink.api.common.state;import org.apache.flink.annotation.PublicEvolving;/*** Interface that different types of partitioned state must implement.** <p>The state is only accessible by functions applied on a {@code KeyedStream}. The key is* automatically supplied by the system, so the function always sees the value mapped to the key of* the current element. That way, the system can handle stream and state partitioning consistently* together.*/
@PublicEvolving
public interface State {/** Removes the value mapped under the current key. */void clear();
}

State 的特征:

  • 位置:存储在 TaskManager 的内存或本地磁盘(RocksDB)
  • 目的:算子处理数据时的"工作记忆"
  • 访问:频繁、实时、微秒级
  • 生命周期:作业运行期间一直存在
  • 可变性:每处理一条数据可能就会更新

1.2 Checkpoint:状态的"快照存档"

@Internal
public interface Snapshotable<S extends StateObject> {/*** Operation that writes a snapshot into a stream that is provided by the given {@link* CheckpointStreamFactory} and returns a @{@link RunnableFuture} that gives a state handle to* the snapshot. It is up to the implementation if the operation is performed synchronous or* asynchronous. In the later case, the returned Runnable must be executed first before* obtaining the handle.** @param checkpointId The ID of the checkpoint.* @param timestamp The timestamp of the checkpoint.* @param streamFactory The factory that we can use for writing our state to streams.* @param checkpointOptions Options for how to perform this checkpoint.* @return A runnable future that will yield a {@link StateObject}.*/@NonnullRunnableFuture<S> snapshot(long checkpointId,long timestamp,@Nonnull CheckpointStreamFactory streamFactory,@Nonnull CheckpointOptions checkpointOptions)throws Exception;
}

Checkpoint 的特征:

  • 位置:持久化存储(HDFS、S3、OSS 等)
  • 目的:容错恢复的"存档点"
  • 访问:低频、定期(如每分钟)
  • 生命周期:独立于作业运行,故障恢复时使用
  • 不可变性:一旦完成就不再改变

2. 形象类比

State = 你正在编辑的 Word 文档(内存中)↓ 每隔一段时间
Checkpoint = 保存到磁盘的文档副本(硬盘上)↓ 如果程序崩溃
Recovery = 从最近的保存恢复(重新加载到内存)

3. 源码层面的关系

3.1 CheckpointableKeyedStateBackend:连接两者的桥梁

/*** Interface that combines both, the {@link KeyedStateBackend} interface, which encapsulates methods* responsible for keyed state management and the {@link Snapshotable} which tells the system how to* snapshot the underlying state.** <p><b>NOTE:</b> State backends that need to be notified of completed checkpoints can additionally* implement the {@link CheckpointListener} interface.** @param <K> Type of the key by which state is keyed.*/
public interface CheckpointableKeyedStateBackend<K>extends KeyedStateBackend<K>, Snapshotable<SnapshotResult<KeyedStateHandle>>, Closeable {/** Returns the key groups which this state backend is responsible for. */KeyGroupRange getKeyGroupRange();/*** Returns a {@link SavepointResources} that can be used by {@link SavepointSnapshotStrategy} to* write out a savepoint in the common/unified format.*/@NonnullSavepointResources<K> savepoint() throws Exception;
}

设计理念:

  • KeyedStateBackend:管理运行时状态的读写
  • Snapshotable:提供状态快照能力
  • CheckpointableKeyedStateBackend:同时具备两种能力

3.2 StateHandle:Checkpoint 的元数据

/*** Base for the handles of the checkpointed states in keyed streams. When recovering from failures,* the handle will be passed to all tasks whose key group ranges overlap with it.*/
public interface KeyedStateHandle extends CompositeStateHandle {/** Returns the range of the key groups contained in the state. */KeyGroupRange getKeyGroupRange();/*** Returns a state over a range that is the intersection between this handle's key-group range* and the provided key-group range.** @param keyGroupRange The key group range to intersect with, will return null if the*     intersection of this handle's key-group and the provided key-group is empty.*/@NullableKeyedStateHandle getIntersection(KeyGroupRange keyGroupRange);/*** Returns a unique state handle id to distinguish with other keyed state handles.
http://www.dtcms.com/a/481665.html

相关文章:

  • QML学习笔记(三十六)QML的ComboBox
  • 媒介宣发的技术革命:Infoseek如何用AI重构企业传播全链路
  • uniapp开发小程序
  • 浦江县建设局网站国家企业信息信用信息公示网址
  • 2025年燃气从业人员考试真题分享
  • SuperMap iServer 数据更新指南
  • C++基础:(十三)list类的模拟实现
  • 【网络编程】从数据链路层帧头到代理服务器:解析路由表、MTU/MSS、ARP、NAT 等网络核心技术
  • 北京网站seowyhseo网站模板但没有后台如何做网站
  • 对接世界职业院校技能大赛标准,唯众打造高质量云计算实训室
  • 利用人工智能、数字孪生、AR/VR 进行军用飞机维护
  • [特殊字符] Maven 编译报错「未与 -source 8 一起设置引导类路径」完美解决方案(以芋道项目为例)
  • 【CV】泊松图像融合
  • 云智融合:人工智能与云计算融合实践指南
  • Maven创建Java项目实战全流程
  • 泉州市住房与城乡建设网站wordpress弹出搜索
  • [创业之路-691]:历史与现实的镜鉴:从三国纷争到华为铁三角的系统性启示
  • 时序数据库选型革命:深入解析Apache IoTDB的架构智慧与实战指南
  • 南通网站制作建设手机网页设计软件下载
  • OpenAI推出即时支付功能,ChatGPT将整合电商能力|技术解析与行业影响
  • 小杰深度学习(seventeen)——视觉-经典神经网络——MObileNetV3
  • 线性代数 | 要义 / 本质 (下篇)
  • C# 预处理指令 (# 指令) 详解
  • 有趣的机器学习-利用神经网络来模拟“古龙”写作风格的输出器
  • AI破解数学界遗忘谜题:GPT-5重新发现尘封二十年的埃尔德什问题解法
  • ui网站推荐如何建网站不花钱
  • Java版自助共享空间系统,打造高效无人值守智慧实体门店
  • 《超越单链表的局限:双链表“哨兵位”设计模式,如何让边界处理代码既优雅又健壮?》
  • HENGSHI SENSE 6.0技术白皮书:基于HQL语义层的Agentic BI动态计算引擎架构解析
  • C#实现MySQL→Clickhouse建表语句转换工具