当前位置：首页 > news >正文

Informer参数代码

news 2025/8/23 12:56:06

Informer参数代码

0、代码教程
1、Huggingface Informer参数
- 📌 表 1：`InformerConfig` — 配置参数汇总
- 📌 表 2：`InformerModel` / `InformerForPrediction` — 前向传播输入
- 1. `InformerConfig` 类 — 必要参数说明
- 2. 输入参数：`InformerModel` 与 `InformerForPrediction` 的前向传播参数
- 总结一览
- Informer模型参数汇总表
- 一、InformerConfig 配置参数汇总
- 二、InformerModel 前向传播参数汇总
- - 必需输入参数
  - 可选输入参数
  - 输出返回值(✅)
- 三、InformerForPrediction 特有参数
- - 训练阶段额外参数
  - 推理阶段生成方法
- 四、完整使用示例
- - 1. 配置初始化
  - 2. InformerModel 使用
  - 3. InformerForPrediction 使用
- 五、关键注意事项
2、股票预测（简单）
- 📘 Informer 模型主要参数说明
- 📊 股票预测示例：使用 Hugging Face 的 Trainer
- 🧠 代码说明
3、 informer股票示例（修正）
- - - 5.1 纠正前
    - 5.2 修正后代码 (单变量)
    - 5.3 多变量输入
4、报错说明
- 0. Trainer compute_metrics()中eval_pred的 logits, labels对应关系
- - - 1.对应关系
    - 2. 测试评估是否生效
    - 3. trainer metric（labels格式）
- 1.past_values 长度
- - 错误分析
  - 解决方案
- 2. distribution_output
- - 错误分析
  - - 错误原因
    - 错误本质
  - 解决方案
3、loss和InformerForPrediction
- - 分析：使用 `InformerModel` 是否能解决问题
  - - `InformerModel` vs `InformerForPrediction`
    - 使用 `InformerModel` 的潜在优势
    - 使用 `InformerModel` 的挑战
    - 是否能解决问题
- 4、必要参数'past_time_features' and 'past_observed_mask'
- 5、滞后序列 lags_sequence
- - 1. 三个参数的作用
  - 2. 输入 shape
  - 3. 举例（单变量收盘价预测）
  - 4. 综合输入到模型
  - 5. 多变量情况 (OHLCV)
  - 6.如何输入模型，及怎组成构造embedding
- 6、输出维度（✅）
5、没Trainer版本
- - 改写后的完整代码（去掉 `Trainer` 和 `TrainingArguments`）：
  - 🔑 改写重点：

0、代码教程

https://github.com/zhouhaoyi/Informer2020
https://hugging-face.cn/docs/transformers/model_doc/informer
model，https://huggingface.co/papers/2012.07436
https://nixtlaverse.nixtla.cn/neuralforecast/models.informer.html#informer
https://huggingface.co/blog/zh/informer

PPO-informer：
https://github.com/Hollywood3/PPO-informer-future-master

论文笔记：https://editor.csdn.net/md?articleId=146533877

根据hugging face的 Informer必要参数，实现股票预测，构造假数据，trainer训练评估完整代码，中文注释说明，参考：https://hugging-face.cn/docs/transformers/model_doc/informer

1、Huggingface Informer参数

https://huggingface.co/blog/zh/informer
https://hugging-face.cn/docs/transformers/model_doc/informer

注意：hf的Informer是用于多元概率时间序列，用在点预测会出现很多问题

提问：
hugging face的 Informer必要参数说明，参考：https://hugging-face.cn/docs/transformers/model_doc/informer
汇总表格输出（InformerConfig、InformerModel / InformerForPrediction）

好的 👍 我帮你把 InformerConfig 和 InformerModel / InformerForPrediction 的必要参数汇总成两张清晰表格。

📌 表 1：`InformerConfig` — 配置参数汇总

参数	类型 / 默认值	必需	说明
`prediction_length`	int	✅	解码器预测的时间步数（输出长度）
`context_length`	int / 默认=`prediction_length`	❌	编码器输入的上下文长度
`input_size`	int / 默认=1	❌	目标变量维度（单变量=1，多变量>1）
`lags_sequence`	list[int] / 默认 `[1,2,3,4,5,6,7]`	❌	滞后步索引，用于构建滞后特征
`num_time_features`	int / 默认=0	❌	时间特征数量（如日期、小时、节假日等）
`num_dynamic_real_features`	int / 默认=0	❌	动态实值特征数（随时间变化）
`num_static_categorical_features`	int / 默认=0	❌	静态分类特征数（如序列 ID）
`num_static_real_features`	int / 默认=0	❌	静态实值特征数（如固定指标）
`cardinality`	list[int]	⚠️	静态分类特征的类别数（若 `num_static_categorical_features>0` 必需）
`embedding_dimension`	list[int]	⚠️	静态分类特征 embedding 维度（需与 `cardinality` 一一对应）
`d_model`	int / 默认=64	❌	Transformer 隐藏维度
`encoder_layers` / `decoder_layers`	int / 默认=2	❌	编码器 / 解码器层数
`encoder_attention_heads` / `decoder_attention_heads`	int / 默认=2	❌	注意力头数量
`encoder_ffn_dim` / `decoder_ffn_dim`	int / 默认=32	❌	前馈网络维度
`activation_function`	str / 默认=`gelu`	❌	激活函数
`attention_type`	str / 默认=`"prob"`	❌	注意力类型：`prob` (ProbSparse) 或 `full`
`sampling_factor`	int / 默认=5	❌	ProbSparse 的采样因子
`distil`	bool / 默认=True	❌	编码器是否使用蒸馏（layer distillation）
Dropout 参数	float / 默认 0.05–0.1	❌	包含 `dropout`, `attention_dropout`, `activation_dropout`, `encoder_layerdrop`, `decoder_layerdrop`
`use_cache`	bool / 默认=True	❌	是否缓存 past key/values 以加速解码
`num_parallel_samples`	int / 默认=100	❌	推理时每步并行采样数量
`init_std`	float / 默认=0.02	❌	权重初始化标准差
`scaling`	str / bool / None / 默认=`"mean"`	❌	输入缩放方式
`distribution_output`	str / 默认=`"student_t"`	❌	输出分布类型（`student_t`, `normal`, `negative_binomial`）
`loss`	str / 默认=`"nll"`	❌	损失函数（负对数似然）

📌 表 2：`InformerModel` / `InformerForPrediction` — 前向传播输入

参数	形状	必需	说明
`past_values`	`(batch, seq_len[, input_size])`	✅	历史目标序列；`seq_len ≥ context_length + max(lags_sequence)`
`past_time_features`	`(batch, seq_len, num_time_features + num_dynamic_real_features)`	⚠️	历史时间特征 + 动态实值特征（若使用时必需）
`past_observed_mask`	`(batch, seq_len[, input_size])` (bool)	⚠️	观测掩码，1=观测，0=缺失
`static_categorical_features`	`(batch, num_static_categorical_features)`	⚠️	静态分类特征输入（若有则必需）
`static_real_features`	`(batch, num_static_real_features)`	⚠️	静态实值特征输入
`future_values`	`(batch, prediction_length[, input_size])`	✅ (训练)	预测区间的真实目标值（训练标签）
`future_time_features`	`(batch, prediction_length, num_time_features + num_dynamic_real_features)`	⚠️	预测区间的时间特征
`future_observed_mask`	`(batch, prediction_length[, input_size])` (bool)	⚠️	预测区间观测掩码（训练时用于损失计算）
其他可选	—	❌	`decoder_attention_mask`, `past_key_values`, `output_attentions`, `output_hidden_states`, `use_cache`, `return_dict` 等

要不要我帮你把这两张表直接导出成 Excel (.xlsx) 文件，方便你做参数对比和实验记录？

以下是 Hugging Face 中文文档中对 Informer 模型 的必要参数说明和解释总结，涵盖了 InformerConfig 类中最关键的参数，以及两个模型类 (InformerModel 和 InformerForPrediction) 前向传播方法的输入参数说明。

1. `InformerConfig` 类 — 必要参数说明

此配置类定义了模型架构所需的所有关键参数，以下为最核心的重要参数（含默认值及说明）：

prediction_length (int, 必填)
解码器预测的时间步数，是模型输出的长度，由数据集决定，需适当设置。(hugging-face.cn)
context_length (int, 可选，默认与 prediction_length 相同)
编码器的上下文长度。如果未设置，默认等于 prediction_length。(hugging-face.cn)
input_size (int, 默认值 1)
目标变量维度，单变量为 1，多变量需根据变量数量设置。(hugging-face.cn)
lags_sequence (list[int], 默认 [1,2,3,4,5,6,7])
输入时间序列的滞后步骤，定义用于构建滞后特征，建议根据数据频率调整。(hugging-face.cn)
num_time_features (int, 默认 0)
时间特征数量，如时间编码、假日、年龄特征等。(hugging-face.cn)
num_dynamic_real_features / num_static_real_features / num_static_categorical_features (int, 默认各为 0)
- num_dynamic_real_features: 每时间步可变化的实值协变量数量。
- num_static_real_features: 静态实值特征数量（保持不变，如促销）。
- num_static_categorical_features: 静态分类特征数量（如时间序列 ID）。(hugging-face.cn)
cardinality / embedding_dimension
如果使用静态分类特征（num_static_categorical_features > 0），需提供：
- cardinality: 每个分类特征的类别数（长度需与 num_static_categorical_features 对应）
- embedding_dimension: 每个分类特征的 embedding 维度（长度需一致）(hugging-face.cn)
模型结构相关
- d_model：Transformer 层的维度，默认值 64
- encoder_layers / decoder_layers：编码器和解码器层数，默认都是 2
- encoder_attention_heads / decoder_attention_heads：注意力头数量，默认均为 2
- encoder_ffn_dim / decoder_ffn_dim：前馈维度（中间层），默认是 32(hugging-face.cn)
其他重要参数
- attention_type: 注意力类型，默认为 "prob"（Informer 专属 ProbSparse）；或 "full" 用于标准 Transformer
- sampling_factor: ProbSparse 的采样因子，仅在 attention_type="prob" 时使用，默认 5
- distil: 是否使用编码器中的蒸馏机制，默认是 True
- Dropout 类相关参数：包括 dropout、encoder_layerdrop、decoder_layerdrop、attention_dropout、activation_dropout 等，默认值为 0.05 至 0.1 不等
- use_cache: 是否使用过去键／值缓存以加速解码，默认是 True
- num_parallel_samples: 推理时每步生成样本数，默认 100(hugging-face.cn)

这些参数决定了 Informer 模型的输入、结构和行为。

2. 输入参数：`InformerModel` 与 `InformerForPrediction` 的前向传播参数

这两个类的前向方法都支持以下输入，特别重要的是 past_values 及相关辅助特征：

past_values (torch.FloatTensor)
形状为 (batch_size, sequence_length) 或 (..., input_size)，包含模型用于预测的历史时间序列值。注意：sequence_length 必须 ≥ config.context_length + max(config.lags_sequence)（若滞后最大值为 7，默认为 context_length + 7）。(hugging-face.cn)
past_time_features (torch.FloatTensor)
时间特征，形状为 (batch_size, sequence_length, num_features)，它包含时间编码（如月日等）和动态实值特征，模型将其作为位置编码的一部分。num_features = num_time_features + num_dynamic_real_features。(hugging-face.cn)
past_observed_mask (torch.BoolTensor)
布尔掩码，指示哪些 past_values 是观测到的（1）和缺失的（0，并以零值替代）。形状同 past_values。(hugging-face.cn)
static_categorical_features / static_real_features
静态协变量特征，可用于添加额外信息（如系列 ID 或常量信息），形状分别为 (batch_size, num_static_categorical_features) 和 (batch_size, num_static_real_features)。(hugging-face.cn)
future_values / future_time_features（多用于训练）
- future_values: 模型预测的目标值，形状 (batch_size, prediction_length) 或加变量维度。
- future_time_features: 预测窗口的时间特征，与 past_time_features 类似但针对 prediction_length。(hugging-face.cn)
在训练阶段，这些是必要的，用作模型输出的监督信号。
future_observed_mask（用于 InformerForPrediction 训练）
指示未来真实值中哪些值为观测到的（1），哪些缺失（0），用于损失计算。(hugging-face.cn)
其他可选参数
如 decoder_attention_mask、head_mask、cross_attn_head_mask、encoder_outputs、past_key_values、output_hidden_states、output_attentions、use_cache、return_dict、cache_position 等，更多细节可参考文档。(hugging-face.cn)

总结一览

配置 / 输入	描述
`prediction_length`, `context_length`, `input_size`, `lags_sequence`	时间步结构与滞后设计的基础参数
时间特征 / 协变量 (`num_time_features`, `dynamic/static real/categorical`)	模型准确捕捉时间和元数据依赖关系
注意力类型与模型结构参数	控制模型结构与效率（ProbSparse 注意力、层数、维度等）
前向输入数据 (`past_values`, `past_time_features`, `past_observed_mask`, …)	模型训练与推理时所需的重要输入

如果你想了解具体某个参数在使用中的推荐值、示例代码，或如何根据自己数据自定义配置，欢迎继续交流。我很乐意进一步帮你深入解析！

Informer模型参数汇总表

一、InformerConfig 配置参数汇总

参数名	类型	默认值	说明
prediction_length	int	None	🔴 必需。解码器的预测长度，即模型预测的时间范围
context_length	int	prediction_length	编码器的上下文长度，如果为None则与prediction_length相同
distribution_output	str	“student_t”	分布输出头：“student_t”、“normal"或"negative_binomial”
loss	str	“nll”	损失函数，负对数似然
input_size	int	1	目标变量的大小，单变量为1，多变量大于1
lags_sequence	list[int]	[1,2,3,4,5,6,7]	输入时间序列的滞后项
scaling	str/bool	“mean”	输入缩放方式：“mean”、"std"或None
num_time_features	int	0	时间特征数量
num_dynamic_real_features	int	0	动态实值特征数量
num_static_categorical_features	int	0	静态分类特征数量
num_static_real_features	int	0	静态实值特征数量
cardinality	list[int]	None	静态分类特征的基数
embedding_dimension	list[int]	None	静态分类特征的嵌入维度
d_model	int	64	Transformer层维度
encoder_layers	int	2	编码器层数
decoder_layers	int	2	解码器层数
encoder_attention_heads	int	2	编码器注意力头数
decoder_attention_heads	int	2	解码器注意力头数
encoder_ffn_dim	int	32	编码器前馈层维度
decoder_ffn_dim	int	32	解码器前馈层维度
activation_function	str	“gelu”	激活函数：“gelu"或"relu”
dropout	float	0.05	全连接层dropout
encoder_layerdrop	float	0.1	编码器层dropout
decoder_layerdrop	float	0.1	解码器层dropout
attention_dropout	float	0.1	注意力dropout
activation_dropout	float	0.1	激活层dropout
num_parallel_samples	int	100	推理时并行采样数
init_std	float	0.02	权重初始化标准差
use_cache	bool	True	是否使用缓存
attention_type	str	“prob”	🔸 Informer特有：“prob"或"full”
sampling_factor	int	5	🔸 ProbSparse采样因子
distil	bool	True	🔸 是否使用蒸馏
is_encoder_decoder	bool	True	编码器-解码器架构标志

二、InformerModel 前向传播参数汇总

必需输入参数

参数名	形状	说明
past_values	(batch_size, sequence_length) 或 (batch_size, sequence_length, input_size)	🔴 历史时间序列值，sequence_length = context_length + max(lags_sequence)
past_time_features	(batch_size, sequence_length, num_features)	🔴 历史时间特征
past_observed_mask	(batch_size, sequence_length)	🔴 观测值掩码

可选输入参数

参数名	形状	说明
static_categorical_features	(batch_size, num_static_categorical)	静态分类特征
static_real_features	(batch_size, num_static_real)	静态实值特征
future_values	(batch_size, prediction_length)	训练时的目标值
future_time_features	(batch_size, prediction_length, num_features)	未来时间特征
decoder_attention_mask	(batch_size, target_sequence_length)	解码器注意力掩码
head_mask	(num_heads,) 或 (num_layers, num_heads)	注意力头掩码
decoder_head_mask	(decoder_layers, decoder_attention_heads)	解码器头掩码
cross_attn_head_mask	(decoder_layers, decoder_attention_heads)	交叉注意力头掩码
encoder_outputs	tuple	编码器输出缓存
past_key_values	list[torch.FloatTensor]	键值缓存
output_hidden_states	bool	是否输出隐藏状态
output_attentions	bool	是否输出注意力权重
use_cache	bool	是否使用缓存
return_dict	bool	是否返回字典格式
cache_position	(sequence_length,)	缓存位置索引

输出返回值(✅)

https://hugging-face.cn/docs/transformers/model_doc/informer
last_hidden_state (torch.FloatTensor，形状为 (batch_size, sequence_length, hidden_size)) — 模型解码器最后一层输出的隐藏状态序列。

返回值	形状	说明
last_hidden_state	(batch_size, sequence_length, hidden_size)	最后一层隐藏状态
past_key_values	Cache	键值缓存
decoder_hidden_states	tuple	各层解码器隐藏状态
decoder_attentions	tuple	解码器注意力权重
cross_attentions	tuple	交叉注意力权重
encoder_last_hidden_state	(batch_size, sequence_length, hidden_size)	编码器最后隐藏状态
encoder_hidden_states	tuple	各层编码器隐藏状态
encoder_attentions	tuple	编码器注意力权重
loc	(batch_size,) 或 (batch_size, input_size)	偏移值
scale	(batch_size,) 或 (batch_size, input_size)	缩放值
static_features	(batch_size, feature_size)	静态特征

三、InformerForPrediction 特有参数

训练阶段额外参数

参数名	形状	说明
future_values	(batch_size, prediction_length)	🔴 训练时必需，作为标签
future_time_features	(batch_size, prediction_length, num_features)	🔴 训练时必需，未来时间特征
future_observed_mask	(batch_size, prediction_length)	未来值观测掩码，用于损失计算

推理阶段生成方法

# generate() 方法参数
model.generate(past_values=...,           # 必需past_time_features=...,    # 必需future_time_features=...,  # 必需past_observed_mask=...,    # 可选static_categorical_features=...,  # 可选static_real_features=...,  # 可选
)

四、完整使用示例

1. 配置初始化

from transformers import InformerConfigconfig = InformerConfig(# 核心参数prediction_length=24,        # 必需：预测24个时间步context_length=48,          # 上下文长度input_size=3,              # 3个变量的多变量序列# 模型架构d_model=128,encoder_layers=3,decoder_layers=3,encoder_attention_heads=8,decoder_attention_heads=8,encoder_ffn_dim=512,decoder_ffn_dim=512,# Informer特有attention_type="prob",      # 使用ProbAttentionsampling_factor=5,         # 采样因子distil=True,              # 启用蒸馏# 特征配置num_time_features=4,       # 4个时间特征num_static_categorical_features=2,  # 2个静态分类特征cardinality=[50, 100],     # 分类特征基数embedding_dimension=[8, 16], # 嵌入维度# 训练参数distribution_output="student_t",scaling="mean",dropout=0.1
)

2. InformerModel 使用

from transformers import InformerModel# 初始化模型
model = InformerModel(config)# 前向传播
outputs = model(past_values=past_values,                    # 形状: (batch, seq_len, 3)past_time_features=past_time_features,      # 形状: (batch, seq_len, 4)past_observed_mask=past_observed_mask,      # 形状: (batch, seq_len)static_categorical_features=static_cat,     # 形状: (batch, 2)future_values=future_values,                # 形状: (batch, 24, 3)future_time_features=future_time_features,  # 形状: (batch, 24, 4)
)# 获取输出
hidden_states = outputs.last_hidden_state

3. InformerForPrediction 使用

from transformers import InformerForPrediction# 初始化预测模型
model = InformerForPrediction(config)# 训练阶段
outputs = model(# 历史数据past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,# 静态特征static_categorical_features=static_cat,# 未来数据（训练时需要）future_values=future_values,           # 作为标签future_time_features=future_time_features,future_observed_mask=future_observed_mask,
)
loss = outputs.loss# 推理阶段
predictions = model.generate(# 只需要历史数据和未来时间特征past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,static_categorical_features=static_cat,future_time_features=future_time_features,num_parallel_samples=100,  # 生成100个样本
)
# predictions.sequences 形状: (batch, 100, 24)
mean_prediction = predictions.sequences.mean(dim=1)

五、关键注意事项

要点	说明
序列长度计算	past_values的sequence_length = context_length + max(lags_sequence)
特征维度	num_features = num_time_features + num_dynamic_real_features
静态特征配置	使用静态分类特征时，cardinality和embedding_dimension必须提供
ProbAttention	attention_type="prob"是Informer的核心创新，大幅降低计算复杂度
蒸馏机制	distil=True可以进一步提升效率
多变量支持	input_size > 1时支持多变量时间序列
分布输出	支持student_t、normal、negative_binomial三种概率分布
缓存机制	use_cache=True可加速自回归生成

🔴 表示必需参数
🔸 表示Informer特有参数

2、股票预测（简单）

提问：
Informer主要参数说明，使用hugging face的 Informer实现股票预测，构造假数据，trainer训练评估完整代码，中文注释说明，参考：https://hugging-face.cn/docs/transformers/model_doc/informer

根据 Hugging Face 官方文档和相关资料，以下是 Informer 模型的主要参数说明，以及使用 Hugging Face 的 Trainer 实现股票预测的完整代码示例，包含中文注释。

📘 Informer 模型主要参数说明

以下是 InformerConfig 类中常用的主要参数及其说明：

参数名称	类型	默认值	说明
`prediction_length`	`int`	必填	预测长度，即模型输出的时间步数。
`context_length`	`int`	必填	输入序列的长度，即模型接受的历史时间步数。
`input_size`	`int`	必填	输入特征的维度，例如 OHLCV 数据的维度为 5。
`hidden_size`	`int`	256	Transformer 模型中隐藏层的维度。
`num_attention_heads`	`int`	8	自注意力机制中的注意力头数量。
`num_hidden_layers`	`int`	6	Transformer 编码器和解码器中的层数。
`attention_type`	`str`	`"prob"`	注意力机制类型，`"prob"` 表示使用 ProbSparse 注意力机制。
`sampling_factor`	`int`	5	ProbSparse 注意力机制的采样因子，控制查询矩阵的输入长度。
`distil`	`bool`	`True`	是否在编码器中使用蒸馏，减少计算量。
`dropout`	`float`	0.1	Dropout 概率，用于防止过拟合。
`activation_function`	`str`	`"gelu"`	激活函数类型，通常使用 `"gelu"`。
`embedding_dimension`	`List[int]`	`[2]`	嵌入维度，通常用于时间特征的嵌入。
`lags_sequence`	`List[int]`	`[]`	滞后序列，用于指定历史时间步的索引。
`num_time_features`	`int`	0	时间特征的数量，例如 `"day_of_week"`、`"hour_of_day"` 等。	(Hugging Face Forums, Hugging Face, Hugging Face, Hugging Face Forums, Hugging Face)

更多详细信息，请参考 Hugging Face 官方文档： (Hugging Face)。

📊 股票预测示例：使用 Hugging Face 的 Trainer

以下是使用 Hugging Face 的 Trainer 实现股票预测的完整代码示例，包含中文注释说明：

import numpy as np
import pandas as pd
import torch
from torch.utils.data import Dataset
from transformers import InformerConfig, InformerModel, Trainer, TrainingArguments
import evaluate
import torch.nn as nn# ========== 1. 构造假股票数据 ==========
num_days = 600
columns = ["open", "high", "low", "close", "volume"]
data = {col: np.random.rand(num_days) * 100 for col in columns[:4]}
data["volume"] = np.random.randint(1000, 5000, size=num_days)
df = pd.DataFrame(data)seq_len = 30  # 输入序列长度
pred_len = 5  # 预测长度
input_size = len(columns)  # OHLCV 5个特征def create_samples(df, seq_len, pred_len):X, y = [], []for i in range(len(df) - seq_len - pred_len):X.append(df.iloc[i : i + seq_len].values)y.append(df.iloc[i + seq_len : i + seq_len + pred_len].values)return np.array(X), np.array(y)X, y = create_samples(df, seq_len, pred_len)# ========== 2. 定义 Dataset ==========
class StockDataset(Dataset):def __init__(self, X, y):self.X = torch.tensor(X, dtype=torch.float32)  # [num_samples, seq_len, input_size]self.y = torch.tensor(y, dtype=torch.float32)  # [num_samples, pred_len, input_size]def __len__(self):return len(self.X)def __getitem__(self, idx):batch_size, seq_len, input_size = 1, self.X.shape[1], self.X.shape[2]past_time_features = torch.zeros(seq_len, 1, dtype=torch.float32)  # 占位past_observed_mask = torch.ones(seq_len, input_size, dtype=torch.bool)  # 全部观察到return {"past_values": self.X[idx],"future_values": self.y[idx], # labels=None"labels": self.y[idx], # labels=None"past_time_features": past_time_features,"past_observed_mask": past_observed_mask,}train_size = int(len(X) * 0.8)
train_ds = StockDataset(X[:train_size], y[:train_size])
test_ds = StockDataset(X[train_size:], y[train_size:])# ========== 3. 定义自定义模型 ==========
class CustomInformer(nn.Module):def __init__(self, config):super().__init__()self.informer = InformerModel(config)# 预测头，将 decoder 输出映射到 pred_len * input_sizeself.prediction_head = nn.Linear(config.hidden_size, config.prediction_length * config.input_size)self.config = configdef forward(self, past_values, past_time_features, past_observed_mask, future_values,labels=None):""":param past_values: [batch, seq_len, input_size]:param past_time_features: [batch, seq_len, num_time_features]:param past_observed_mask: [batch, seq_len, input_size]:param future_values: [batch, pred_len, input_size] 可选，用于训练计算 loss"""# InformerModel 必须提供 past_time_features 和 past_observed_maskoutputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask)decoder_output = outputs.last_hidden_state  # [batch, 1, hidden_size]# 预测多步多变量preds = self.prediction_head(decoder_output)  # [batch, 1, pred_len*input_size]preds = preds.view(decoder_output.size(0), self.config.prediction_length, self.config.input_size)loss = None# # 错误的写法# if future_values is not None:#     loss = nn.MSELoss()(preds, future_values)# 正确的写法if labels is not None:loss = nn.MSELoss()(preds, labels)return {"loss": loss, "logits": preds} if loss is not None else {"logits": preds}lags_sequence = [1,2,3,4,5,6,7]  # 最大滞后 7 
# ========== 4. 定义模型配置 ==========
config = InformerConfig(input_size=input_size,hidden_size=64,prediction_length=pred_len,lags_sequence=lags_sequence,context_length=seq_len - max(lags_sequence),      # # 那么数据切片 past_values 需要长度 >=  context_length + max(lags_sequence)num_attention_heads=4,num_hidden_layers=2,attention_type="prob",num_time_features=1  # 占位
)model = CustomInformer(config)# ========== 5. 定义评估指标 ==========
mse_metric = evaluate.load("mse")def compute_metrics(eval_pred):print("进入 compute_metrics")logits = eval_pred.predictionslabels = eval_pred.label_idsmse = mse_metric.compute(predictions=logits.flatten(), references=labels.flatten())print("计算的MSE:", mse)# logging.info(f"计算的MSE: {mse}")return {"mse": mse}# ========== 6. 设置训练参数 ==========
training_args = TrainingArguments(output_dir="./informer-stock",per_device_train_batch_size=32,per_device_eval_batch_size=32,num_train_epochs=6,eval_strategy="epoch",save_strategy="epoch",logging_dir="./logs",logging_strategy="steps",logging_steps=20,learning_rate=1e-3,# report_to="none",
)# ========== 7. 初始化 Trainer ==========
trainer = Trainer(model=model,args=training_args,train_dataset=train_ds,eval_dataset=test_ds,compute_metrics=compute_metrics,
)# # 测试评估是否生效
# print('测试评估是否生效:')
# metrics = trainer.evaluate()
# print(metrics)
# quit() # ========== 8. 开始训练 ==========
trainer.train()# 评估模型
eval_results = trainer.evaluate()
print("评估结果:", eval_results)

🧠 代码说明

数据构造：生成包含 600 天股票数据的 DataFrame，字段包括 "open"、"high"、"low"、"close" 和 "volume"。然后，使用 create_samples 函数构造输入序列和对应的预测目标。
Dataset 定义：创建 StockDataset 类，继承自 torch.utils.data.Dataset，用于加载股票数据。每个样本包含历史序列（past_values）和对应的未来预测值（future_values）。
模型配置：使用 InformerConfig 定义模型的超参数，包括预测长度、输入序列长度、输入特征维度、隐藏层维度、注意力头数量、隐藏层数量和注意力机制类型。
模型加载：根据配置初始化 InformerForPrediction 模型。
评估指标：使用 evaluate.load("mse") 加载均方误差（MSE）指标，并定义 compute_metrics 函数计算评估指标。
训练参数设置：使用 TrainingArguments 设置训练参数，包括输出目录、批次大小、训练轮数、评估策略、保存策略、日志目录、日志策略、日志步数和学习率。
Trainer 初始化：使用 Trainer 类初始化训练器，传入模型、训练参数、训练数据集、评估数据集和评估指标。
模型训练：调用 trainer.train() 开始训练模型。
模型评估：调用 trainer.evaluate() 评估模型性能，并打印评估指标。
单个样本预测：选择最后一个样本，使用模型进行预测，并打印真实值和预测值。

如果您希望进一步扩展功能，例如：

添加时间特征：如 "day_of_week"、"hour_of_day" 等，以提供更多的时间上下文信息。
滚动预测：实现多步滚动预测（rolling forecast），逐步预测未来多个时间步的值。
模型保存与加载：保存训练好的模型，并在需要时加载进行预测。

请随时告诉我，我将为您提供相应的代码示例。

3、 informer股票示例（修正）

5.1 纠正前

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from transformers import InformerConfig, InformerModel, Trainer, TrainingArguments
from sklearn.metrics import mean_squared_error
import pandas as pd
from datetime import datetime# 定义自定义数据集类，用于处理时间序列数据
class TimeSeriesDataset(Dataset):def __init__(self, data, dates, context_length, prediction_length, lags_sequence):"""初始化数据集:param data: numpy数组，形状为 (total_length, 1)，模拟股票价格序列:param dates: pandas DatetimeIndex，日期序列，用于生成时间特征:param context_length: 历史上下文长度:param prediction_length: 预测长度:param lags_sequence: 滞后序列"""self.data = data.astype(np.float32)  # 确保数据类型为 float32self.dates = datesself.context_length = context_lengthself.prediction_length = prediction_lengthself.lags_sequence = lags_sequence# 确保past_values包含足够的额外历史数据self.total_context_length = context_length + max(lags_sequence)self.num_samples = len(data) - self.total_context_length - prediction_length + 1# 生成时间特征：year, month, day, weekday，归一化self.num_time_features = 4time_features = np.zeros((len(data), self.num_time_features))time_features[:, 0] = (dates.year - 2000) / 100.0  # year normalizedtime_features[:, 1] = (dates.month - 1) / 11.0  # month normalizedtime_features[:, 2] = (dates.day - 1) / 30.0  # day normalizedtime_features[:, 3] = dates.weekday / 6.0  # weekday normalizedself.time_features = time_features.astype(np.float32)def __len__(self):return self.num_samplesdef __getitem__(self, index):"""获取单个样本:param index: 样本索引:return: 字典，包含 past_values, future_values, past_observed_mask, past_time_features, future_time_features"""past_start = indexpast_end = past_start + self.total_context_lengthfuture_end = past_end + self.prediction_lengthpast_values = torch.tensor(self.data[past_start:past_end])  # 形状: (total_context_length, 1)future_values = torch.tensor(self.data[past_end:future_end])  # 形状: (prediction_length, 1)past_observed_mask = torch.ones_like(past_values, dtype=torch.bool)  # 假设所有历史数据均已观察past_time_features = torch.tensor(self.time_features[past_start:past_end])  # 形状: (total_context_length, num_time_features)future_time_features = torch.tensor(self.time_features[past_end:future_end])  # 形状: (prediction_length, num_time_features)return {"past_values": past_values,"future_values": future_values,"past_observed_mask": past_observed_mask,"past_time_features": past_time_features,"future_time_features": future_time_features,}# 构造假股票数据：使用随机游走模拟股票价格序列，并生成假日期
def generate_fake_stock_data(length=10000):"""生成假股票数据和日期序列:param length: 数据总长度:return: numpy数组价格，pandas DatetimeIndex日期"""returns = np.random.normal(0, 0.01, size=length)  # 日回报率prices = np.cumsum(returns) + 100  # 从100开始累加，模拟价格dates = pd.date_range(start='2000-01-01', periods=length, freq='D')  # 假日期，从2000-01-01开始每日return prices.reshape(-1, 1), dates# 自定义模型，基于 InformerModel 添加预测头
class CustomInformerForPrediction(nn.Module):def __init__(self, config):super().__init__()self.informer = InformerModel(config)# 添加线性层，将解码器输出映射到 [batch_size, prediction_length, input_size]# self.prediction_head = nn.Linear(config.d_model, config.input_size)self.prediction_head = nn.Linear(config.d_model, config.prediction_length)self.config = configdef forward(self, past_values, past_time_features, past_observed_mask, future_time_features, future_values=None):"""前向传播:param past_values: 形状 [batch_size, total_context_length, input_size]:param past_time_features: 形状 [batch_size, total_context_length, num_time_features]:param past_observed_mask: 形状 [batch_size, total_context_length, input_size]:param future_time_features: 形状 [batch_size, prediction_length, num_time_features]:param future_values: 形状 [batch_size, prediction_length, input_size]（训练时提供）:return: 字典，包含 loss 和 logits"""# 获取 InformerModel 输出outputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,future_time_features=future_time_features)# 解码器输出形状为 [batch_size, prediction_length, d_model]decoder_output = outputs.last_hidden_stateprint("解码器输出形状为 :", decoder_output.shape)# 通过预测头映射到 [batch_size, prediction_length, input_size]# predictions = self.prediction_head(decoder_output)# print("预测头映射到Predictions shape:", predictions.shape)# 通过预测头映射predictions = self.prediction_head(decoder_output)  # [batch, 1, 24]# 调整维度到 [batch, 24, 1]predictions = predictions.permute(0, 2, 1)print("预测头映射到Predictions shape:", predictions.shape)# 计算 MSE 损失（如果提供 future_values）loss = Noneif future_values is not None:loss = nn.MSELoss()(predictions, future_values)return {"loss": loss, "logits": predictions} if loss is not None else {"logits": predictions}# 主程序
if __name__ == "__main__":# 生成假数据和日期fake_data, fake_dates = generate_fake_stock_data(length=10000)# 分割训练和评估数据train_data = fake_data[:8000]train_dates = fake_dates[:8000]eval_data = fake_data[8000:]eval_dates = fake_dates[8000:]# 定义模型配置：基于Hugging Face Informer必要参数config = InformerConfig(input_size=1,  # 输入特征维度（单变量股票价格）prediction_length=24,  # 预测未来24个时间步context_length=168,  # 历史上下文长度（例如过去一周168小时）lags_sequence=[1, 2, 3, 4, 5, 6, 7, 24],  # 滞后序列，用于捕捉不同周期scaling="mean",  # 数据缩放方式：均值缩放num_time_features=4,  # 时间相关特征的维度（year, month, day, weekday）num_dynamic_real_features=0,  # 无动态实数特征num_static_real_features=0,  # 无静态实数特征num_static_categorical_features=0,  # 无静态类别特征d_model=64,  # 模型隐藏维度encoder_layers=2,  # 编码器层数decoder_layers=2,  # 解码器层数encoder_attention_heads=8,  # 编码器注意力头数decoder_attention_heads=8,  # 解码器注意力头数encoder_ffn_dim=2048,  # 编码器FFN维度decoder_ffn_dim=2048,  # 解码器FFN维度dropout=0.1,  # dropout比率attention_dropout=0.1,  # 注意力dropoutactivation_dropout=0.1,  # 激活dropoutattention_type="prob",  # 注意力类型：概率稀疏注意力sampling_factor=5,  # 采样因子distil=True  # 是否使用蒸馏)# 初始化自定义模型model = CustomInformerForPrediction(config)# 创建训练和评估数据集train_dataset = TimeSeriesDataset(train_data, train_dates, config.context_length, config.prediction_length, config.lags_sequence)eval_dataset = TimeSeriesDataset(eval_data, eval_dates, config.context_length, config.prediction_length, config.lags_sequence)# 定义训练参数training_args = TrainingArguments(output_dir="./informer_results",  # 输出目录num_train_epochs=3,  # 训练轮数per_device_train_batch_size=32,  # 训练批量大小per_device_eval_batch_size=32,  # 评估批量大小warmup_steps=500,  # 学习率热身步数weight_decay=0.01,  # 权重衰减logging_dir='./logs',  # 日志目录logging_steps=10,  # 每10步记录日志eval_strategy="epoch",  # 每轮结束评估save_strategy="epoch",  # 每轮保存模型load_best_model_at_end=True,  # 加载最佳模型metric_for_best_model="mse",  # 最佳模型指标greater_is_better=False,  # 损失越小越好)# 定义评估指标函数def compute_metrics(eval_pred):"""计算评估指标:param eval_pred: 评估预测输出:return: 字典，包含MSE"""predictions = eval_pred.predictions  # 点预测值labels = eval_pred.label_idsmse = mean_squared_error(labels.flatten(), predictions.flatten())print("计算的MSE:", mse)return {"mse": mse}# 初始化Trainertrainer = Trainer(model=model,args=training_args,train_dataset=train_dataset,eval_dataset=eval_dataset,compute_metrics=compute_metrics,)# 训练模型trainer.train()# 评估模型eval_results = trainer.evaluate()print("评估结果:", eval_results)

5.2 修正后代码 (单变量)

好的，我帮你整理了一个 完整可运行版本，包括：

自定义 Dataset 返回 labels
CustomInformerForPrediction forward 接受 labels
预测头输出维度正确 [batch, prediction_length, input_size]
compute_metrics 自动计算 MSE
Trainer TrainingArguments 参数正确，能正常评估并保存最优模型

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from transformers import InformerConfig, InformerModel, Trainer, TrainingArguments
from sklearn.metrics import mean_squared_error
import pandas as pd# -------------------------------
# 生成假股票数据
# -------------------------------
def generate_fake_stock_data(length=10000):returns = np.random.normal(0, 0.01, size=length)prices = np.cumsum(returns) + 100dates = pd.date_range(start='2000-01-01', periods=length, freq='D')return prices.reshape(-1, 1), dates# -------------------------------
# 自定义时间序列数据集
# -------------------------------
class TimeSeriesDataset(Dataset):def __init__(self, data, dates, context_length, prediction_length, lags_sequence):self.data = data.astype(np.float32)self.dates = datesself.context_length = context_lengthself.prediction_length = prediction_lengthself.lags_sequence = lags_sequenceself.total_context_length = context_length + max(lags_sequence)self.num_samples = len(data) - self.total_context_length - prediction_length + 1# 时间特征: year, month, day, weekdayself.num_time_features = 4time_features = np.zeros((len(data), self.num_time_features))time_features[:, 0] = (dates.year - 2000) / 100.0time_features[:, 1] = (dates.month - 1) / 11.0time_features[:, 2] = (dates.day - 1) / 30.0time_features[:, 3] = dates.weekday / 6.0self.time_features = time_features.astype(np.float32)def __len__(self):return self.num_samplesdef __getitem__(self, index):past_start = indexpast_end = past_start + self.total_context_lengthfuture_end = past_end + self.prediction_lengthpast_values = torch.tensor(self.data[past_start:past_end])future_values = torch.tensor(self.data[past_end:future_end])past_observed_mask = torch.ones_like(past_values, dtype=torch.bool)past_time_features = torch.tensor(self.time_features[past_start:past_end])future_time_features = torch.tensor(self.time_features[past_end:future_end])return {"past_values": past_values,"future_values": future_values,"labels": future_values,  # Trainer 需要"past_observed_mask": past_observed_mask,"past_time_features": past_time_features,"future_time_features": future_time_features,}# -------------------------------
# 自定义模型
# -------------------------------
class CustomInformerForPrediction(nn.Module):def __init__(self, config):super().__init__()self.informer = InformerModel(config)self.prediction_head = nn.Linear(config.d_model, config.prediction_length)self.config = configdef forward(self, past_values, past_time_features, past_observed_mask, future_time_features, labels=None):outputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,future_time_features=future_time_features)# 解码器输出: [batch, prediction_length, d_model]decoder_output = outputs.last_hidden_state# 预测头映射并调整形状 [batch, prediction_length, 1]predictions = self.prediction_head(decoder_output).permute(0, 2, 1)loss = Noneif labels is not None:loss = nn.MSELoss()(predictions, labels)return {"loss": loss, "logits": predictions} if loss is not None else {"logits": predictions}# -------------------------------
# 主程序
# -------------------------------
if __name__ == "__main__":# 生成假数据fake_data, fake_dates = generate_fake_stock_data(length=10000)train_data, train_dates = fake_data[:8000], fake_dates[:8000]eval_data, eval_dates = fake_data[8000:], fake_dates[8000:]# 模型配置config = InformerConfig(input_size=1,prediction_length=24,context_length=168,lags_sequence=[1,2,3,4,5,6,7,24],scaling="mean",num_time_features=4,num_dynamic_real_features=0,num_static_real_features=0,num_static_categorical_features=0,d_model=64,encoder_layers=2,decoder_layers=2,encoder_attention_heads=8,decoder_attention_heads=8,encoder_ffn_dim=2048,decoder_ffn_dim=2048,dropout=0.1,attention_dropout=0.1,activation_dropout=0.1,attention_type="prob",sampling_factor=5,distil=True)model = CustomInformerForPrediction(config)# 数据集train_dataset = TimeSeriesDataset(train_data, train_dates, config.context_length, config.prediction_length, config.lags_sequence)eval_dataset = TimeSeriesDataset(eval_data, eval_dates, config.context_length, config.prediction_length, config.lags_sequence)# TrainingArgumentstraining_args = TrainingArguments(output_dir="./informer_results",num_train_epochs=3,per_device_train_batch_size=32,per_device_eval_batch_size=32,warmup_steps=500,weight_decay=0.01,logging_dir='./logs',logging_steps=10,evaluation_strategy="epoch",   # ✅ 修正save_strategy="epoch",load_best_model_at_end=True,metric_for_best_model="eval_mse",greater_is_better=False,)# 计算指标def compute_metrics(eval_pred):predictions, labels = eval_predmse = mean_squared_error(labels.reshape(-1), predictions.reshape(-1))return {"mse": mse}# Trainertrainer = Trainer(model=model,args=training_args,train_dataset=train_dataset,eval_dataset=eval_dataset,compute_metrics=compute_metrics,)# 训练trainer.train()# 评估eval_results = trainer.evaluate()print("评估结果:", eval_results)

这个版本保证了：

预测与标签 shape 完全对齐 [batch, 24, 1]
Trainer 可以正确计算 eval_mse 并使用它判断最佳模型
compute_metrics 可以扩展，之后加 MAE、RMSE、MAPE 都很方便

我可以帮你再加一个 支持多步预测可视化 的小函数，这样每轮训练完都能画预测 vs 真实曲线。

你希望我加吗？

5.3 多变量输入

修改成ohlcv多变量输入
明白了，你希望将原本 单变量股票价格（input_size=1）修改为 OHLCV 多变量输入（open, high, low, close, volume），也就是 input_size=5。同时，预测目标可以是收盘价或所有 OHLCV 都预测，这里我示例 预测收盘价，输入使用 OHLCV。下面是修改后的完整代码示例：

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from transformers import InformerConfig, InformerModel, Trainer, TrainingArguments
from sklearn.metrics import mean_squared_error
import pandas as pd# -------------------------------
# 生成假OHLCV数据
# -------------------------------
# def generate_fake_ohlcv_data(length=10000):
#     """
#     生成OHLCV随机游走数据
#     """
#     returns = np.random.normal(0, 0.01, size=(length, 1))
#     close = np.cumsum(returns) + 100
#     open = close + np.random.normal(0, 0.5, size=(length,1))
#     high = np.maximum(open, close) + np.random.uniform(0, 1, size=(length,1))
#     low = np.minimum(open, close) - np.random.uniform(0, 1, size=(length,1))
#     volume = np.random.randint(100, 1000, size=(length,1))
#     ohlcv = np.concatenate([open, high, low, close, volume], axis=1)
#     dates = pd.date_range(start='2000-01-01', periods=length, freq='D')
#     return ohlcv.astype(np.float32), datesdef generate_fake_ohlcv_data(length=10000):"""生成OHLCV随机游走数据，每列都为二维数组 (length,1)"""returns = np.random.normal(0, 0.01, size=(length, 1))close = (np.cumsum(returns) + 100).reshape(-1,1)  # shape: (length,1)open = (close + np.random.normal(0, 0.5, size=(length,1))).reshape(-1,1)high = (np.maximum(open, close) + np.random.uniform(0,1,size=(length,1))).reshape(-1,1)low = (np.minimum(open, close) - np.random.uniform(0,1,size=(length,1))).reshape(-1,1)volume = np.random.randint(100,1000,size=(length,1)).astype(np.float32)# 拼接成 (length,5)ohlcv = np.concatenate([open, high, low, close, volume], axis=1)dates = pd.date_range(start='2000-01-01', periods=length, freq='D')return ohlcv.astype(np.float32), dates# -------------------------------
# 自定义OHLCV时间序列数据集
# -------------------------------
class TimeSeriesOHLCVDataset(Dataset):def __init__(self, data, dates, context_length, prediction_length, lags_sequence, target_index=3):"""target_index=3 预测收盘价"""self.data = data.astype(np.float32)self.dates = datesself.context_length = context_lengthself.prediction_length = prediction_lengthself.lags_sequence = lags_sequenceself.total_context_length = context_length + max(lags_sequence)self.num_samples = len(data) - self.total_context_length - prediction_length + 1self.num_time_features = 4# 时间特征time_features = np.zeros((len(data), self.num_time_features))time_features[:,0] = (dates.year - 2000)/100.0time_features[:,1] = (dates.month - 1)/11.0time_features[:,2] = (dates.day -1)/30.0time_features[:,3] = dates.weekday/6.0self.time_features = time_features.astype(np.float32)self.target_index = target_indexdef __len__(self):return self.num_samplesdef __getitem__(self, index):past_start = indexpast_end = past_start + self.total_context_lengthfuture_end = past_end + self.prediction_lengthpast_values = torch.tensor(self.data[past_start:past_end])  # shape: [total_context_length, 5]future_values = torch.tensor(self.data[past_end:future_end, self.target_index:self.target_index+1])  # 预测收盘价 shape: [prediction_length,1]past_observed_mask = torch.ones_like(past_values, dtype=torch.bool)past_time_features = torch.tensor(self.time_features[past_start:past_end])future_time_features = torch.tensor(self.time_features[past_end:future_end])return {"past_values": past_values,"future_values": future_values,"labels": future_values,"past_observed_mask": past_observed_mask,"past_time_features": past_time_features,"future_time_features": future_time_features,}# -------------------------------
# 自定义Informer模型预测收盘价
# -------------------------------
class CustomInformerOHLCV(nn.Module):def __init__(self, config):super().__init__()self.informer = InformerModel(config)self.prediction_head = nn.Linear(config.d_model, config.prediction_length)  # 输出预测长度self.config = configdef forward(self, past_values, past_time_features, past_observed_mask, future_time_features, labels=None):outputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,future_time_features=future_time_features)# preds = outputs.decoder_hidden_states()   # 预测结果 config.output_hidden_states=True 时才会返回# print(preds.shape)  # (batch_size, prediction_length, target_dim)decoder_output = outputs.last_hidden_state  # [batch, 1, d_model]print("解码器输出形状为 :", decoder_output.shape)predictions = self.prediction_head(decoder_output).permute(0,2,1)  # [batch, prediction_length, 1]print("预测头映射到Predictions shape:", predictions.shape)loss = Noneif labels is not None:loss = nn.MSELoss()(predictions, labels)return {"loss": loss, "logits": predictions} if loss is not None else {"logits": predictions}# -------------------------------
# 主程序
# -------------------------------
if __name__ == "__main__":fake_data, fake_dates = generate_fake_ohlcv_data(length=10000)train_data, train_dates = fake_data[:8000], fake_dates[:8000]eval_data, eval_dates = fake_data[8000:], fake_dates[8000:]config = InformerConfig(input_size=5,  # OHLCV输入prediction_length=24,context_length=168,lags_sequence=[1,2,3,4,5,6,7,24],scaling="mean",num_time_features=4,num_dynamic_real_features=0,num_static_real_features=0,num_static_categorical_features=0,d_model=64,encoder_layers=2,decoder_layers=2,encoder_attention_heads=8,decoder_attention_heads=8,encoder_ffn_dim=2048,decoder_ffn_dim=2048,dropout=0.1,attention_dropout=0.1,activation_dropout=0.1,attention_type="prob",sampling_factor=5,distil=True,)model = CustomInformerOHLCV(config)train_dataset = TimeSeriesOHLCVDataset(train_data, train_dates, config.context_length, config.prediction_length, config.lags_sequence)eval_dataset = TimeSeriesOHLCVDataset(eval_data, eval_dates, config.context_length, config.prediction_length, config.lags_sequence)training_args = TrainingArguments(output_dir="./informer_results",num_train_epochs=3,per_device_train_batch_size=32,per_device_eval_batch_size=32,warmup_steps=500,weight_decay=0.01,logging_dir='./logs',logging_steps=10,eval_strategy="epoch",save_strategy="epoch",load_best_model_at_end=True,metric_for_best_model="eval_mse",greater_is_better=False,)def compute_metrics(eval_pred):predictions, labels = eval_predmse = mean_squared_error(labels.reshape(-1), predictions.reshape(-1))return {"eval_mse": mse}trainer = Trainer(model=model,args=training_args,train_dataset=train_dataset,eval_dataset=eval_dataset,compute_metrics=compute_metrics)trainer.train()eval_results = trainer.evaluate()print("评估结果:", eval_results)

✅ 修改点：

input_size=5，输入 OHLCV 五列
数据集返回 labels，用于 Trainer 计算 loss
future_values 只包含收盘价（target_index=3）
预测头输出 [batch, prediction_length, 1]，与标签对齐
compute_metrics 返回 eval_mse，Trainer 可正确选择最佳模型

如果你希望 同时预测OHLCV全部五列 而不仅是收盘价，我可以帮你改成 输出 shape=[batch, prediction_length, 5]，让预测头覆盖所有变量。

你希望我改成多变量预测吗？

4、报错说明

0. Trainer compute_metrics()中eval_pred的 logits, labels对应关系

1.对应关系

好问题 👍

def compute_metrics(eval_pred):print("进入 compute_metrics")predictions, labels = eval_predlogits = eval_pred.predictionslabels = eval_pred.label_ids
在这里插入代码片

在 Hugging Face 的 Trainer 框架里，compute_metrics(eval_pred) 里的

eval_pred.predictions (也就是 logits)
就是模型 forward() 返回的输出中的 "logits" 部分。
👉 也就是说你在 CustomInformer.forward 里 return {"logits": preds}，那这里 logits = preds。
eval_pred.label_ids (也就是 labels)
就是模型 forward() 返回的输出中的 "labels" 参数。
是 Dataset.__getitem__() 返回的 dict 中，键名为 "labels" 的张量。
👉 所以如果你在 Dataset 里返回的是
```
{"past_values": ...,"past_time_features": ...,"past_observed_mask": ...,"labels": y[idx],   # 注意必须叫 labels
}
```
那么 eval_pred.label_ids = y[idx]。

Trainer 的工作机制是这样的：

Dataset 提供 labels → Trainer 会自动把它传进 model.forward(labels=...)。
Model.forward 返回一个 dict，其中 "logits" 会被当成预测结果保存。
评估时，compute_metrics(eval_pred) 里的：
- eval_pred.predictions = logits
- eval_pred.label_ids = labels

🔧 对应到你这份代码：

目前你 Dataset.__getitem__ 返回的是：

return {"past_values": self.X[idx],"future_values": self.y[idx],   # 注意这里不是 labels"past_time_features": past_time_features,"past_observed_mask": past_observed_mask,
}

所以 Trainer 其实拿不到 labels。

导致 eval_pred.label_ids 为空。
你的模型里 forward 参数写的是 future_values，Trainer 也不会自动传进去。

✅ 建议修改为：

def __getitem__(self, idx):past_time_features = torch.zeros(self.X.shape[1], 1, dtype=torch.float32)past_observed_mask = torch.ones(self.X.shape[1], self.X.shape[2], dtype=torch.bool)return {"past_values": self.X[idx],"labels": self.y[idx],  # 必须叫 labels"past_time_features": past_time_features,"past_observed_mask": past_observed_mask,}

然后模型改为：

def forward(self, past_values, past_time_features, past_observed_mask, labels=None):outputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,)decoder_output = outputs.last_hidden_statepreds = self.prediction_head(decoder_output)preds = preds.view(decoder_output.size(0), self.config.prediction_length, self.config.input_size)loss = Noneif labels is not None:loss = nn.MSELoss()(preds, labels)return {"loss": loss, "logits": preds}

这样：

在 compute_metrics(eval_pred) 里，
- logits 对应预测的未来价格序列 preds
- labels 对应真实未来价格序列 y

你就能正确做 MSE 对比了。

要不要我帮你直接改完整的 Dataset + Model + compute_metrics 三个部分，让它们完全对齐 Hugging Face Trainer 的标准接口？

2. 测试评估是否生效

# 测试评估是否生效
print('测试评估是否生效:')
metrics = trainer.evaluate()
print(metrics)
quit() # ========== 8. 开始训练 ==========
trainer.train()# ========== 5. 定义评估指标 ==========
mse_metric = evaluate.load("mse")def compute_metrics(eval_pred):print("进入 compute_metrics")logits = eval_pred.predictionslabels = eval_pred.label_idsmse = mse_metric.compute(predictions=logits.flatten(), references=labels.flatten())print("计算的MSE:", mse)# logging.info(f"计算的MSE: {mse}")return {"mse": mse}

3. trainer metric（labels格式）

KeyError: "The metric_for_best_model training argument is set to 'eval_mse', which is not found in the evaluation metrics. The available evaluation metrics are: ['eval_runtime', 'eval_samples_per_second', 'eval_steps_per_second', 'epoch']. Consider changing the metric_for_best_model via the TrainingArguments."

太好了 👍 代码基本完整，我看到了几个关键点需要修正，否则会继续报错或指标不生效：

TrainingArguments 参数名写错

evaluation_strategy="epoch"

应该是：

eval_strategy="epoch"

否则不会触发 evaluate()，也不会产出 eval_mse。

metric_for_best_model 名称

你在 compute_metrics 里返回的是：

return {"mse": mse}

Trainer 会自动在日志里加 eval_ 前缀，所以实际名字是：

eval_mse

因此需要写成：

metric_for_best_model="eval_mse",
greater_is_better=False,

这样才不会再报 KeyError: 'eval_mse'。

compute_metrics 输入处理

在 Hugging Face Trainer 里，eval_pred 是个 EvalPrediction 对象（或 tuple），其中：

eval_pred.predictions = 模型输出的 logits
eval_pred.label_ids = 数据集里提供的 labels

但你的 Dataset 没有返回 labels 字段，而是叫 future_values。
Trainer 默认会找 labels，所以它现在可能传不进来。

解决方法有两种：

方法 A：在 Dataset.__getitem__ 里加 labels

return {"past_values": past_values,"future_values": future_values,"labels": future_values,   # ⭐ 添加这行，Trainer才能识别"past_observed_mask": past_observed_mask,"past_time_features": past_time_features,"future_time_features": future_time_features,
}

方法 B：在 CustomInformerForPrediction.forward 里改成

def forward(self, past_values, past_time_features, past_observed_mask, future_time_features, labels=None):outputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,future_time_features=future_time_features)decoder_output = outputs.last_hidden_statepredictions = self.prediction_head(decoder_output).permute(0, 2, 1)loss = Noneif labels is not None:loss = nn.MSELoss()(predictions, labels)return {"loss": loss, "logits": predictions}

这样 Trainer 会把 labels 自动传进去。

compute_metrics Flatten 方式

你现在写的是：

mse = mean_squared_error(labels.flatten(), predictions.flatten())

建议更稳妥写法（保证形状一致）：

mse = mean_squared_error(labels.reshape(-1), predictions.reshape(-1))

✅ 修改后的关键部分（推荐方案）：

# Dataset
def __getitem__(self, index):...return {"past_values": past_values,"future_values": future_values,"labels": future_values,  # 让 Trainer 知道这是监督目标"past_observed_mask": past_observed_mask,"past_time_features": past_time_features,"future_time_features": future_time_features,}# TrainingArguments
training_args = TrainingArguments(output_dir="./informer_results",num_train_epochs=3,per_device_train_batch_size=32,per_device_eval_batch_size=32,warmup_steps=500,weight_decay=0.01,logging_dir='./logs',logging_steps=10,evaluation_strategy="epoch",   # ✅ 修正save_strategy="epoch",load_best_model_at_end=True,metric_for_best_model="eval_mse",  # ✅ 用 eval_msegreater_is_better=False,           # ✅ MSE 越小越好
)# compute_metrics
def compute_metrics(eval_pred):predictions, labels = eval_predmse = mean_squared_error(labels.reshape(-1), predictions.reshape(-1))return {"mse": mse}

这样一改，Trainer 在日志里就会有：

{'eval_loss': ..., 'eval_mse': ..., 'eval_runtime': ...}

并且能正常保存/加载最优模型。

要不要我帮你把你的 CustomInformerForPrediction 和 Dataset 一起改成 Hugging Face Trainer 完全兼容版本（即 forward 接受 labels，Dataset 返回 labels），这样以后算 MAE、RMSE、MAPE 之类指标都能无缝加？

1.past_values 长度

错误本质：past_values 长度 < 某个 lag 值，要求 past_values 的长度大于 context_length + max(lags_sequence)

lags_sequence (list[int], 默认 [1,2,3,4,5,6,7])
输入时间序列的滞后步骤，定义用于构建滞后特征，建议根据数据频率调整。(hugging-face.cn)
past_values (torch.FloatTensor)
形状为 (batch_size, sequence_length) 或 (..., input_size)，包含模型用于预测的历史时间序列值。注意：sequence_length 必须 ≥ config.context_length + max(config.lags_sequence)（若滞后最大值为 7，默认为 context_length + 7）。(hugging-face.cn)

错误分析

根据错误信息：

ValueError: lags cannot go further than history length, found lag 1 while history length is only 192

错误发生在 InformerForPrediction 模型的 get_lagged_subsequences 函数中，表明模型尝试访问滞后值（lags_sequence 中的值），但输入的 past_values 序列长度不足以支持这些滞后。你的分析指出，问题的本质是：

past_values 的长度 < context_length + max(lags_sequence)

具体来说：

context_length=168，prediction_length=24，因此 past_values 的长度应为 context_length=168。
lags_sequence=[1, 2, 3, 4, 5, 6, 7, 24]，最大滞后值为 24。
Informer 模型在处理滞后子序列时，要求输入序列的长度（past_values 的历史长度）至少为 context_length + max(lags_sequence)，即 168 + 24 = 192。
错误提示 history length is only 192 表明模型期望的输入序列长度不足，可能是因为 past_values 的实际长度或模型内部计算的逻辑导致了问题。

进一步分析代码和 Hugging Face Informer 实现（参考：https://hugging-face.cn/docs/transformers/model_doc/informer），发现以下问题：

滞后序列计算逻辑：
- Informer 模型在 create_network_inputs 中调用 get_lagged_subsequences，会将 past_values 扩展到包含滞后特征，导致需要额外的历史数据。
- 当前 past_values 的长度为 context_length=168，不足以支持 context_length + max(lags_sequence)=192，因为数据集提供的历史序列未考虑滞后扩展。
数据集问题：
- TimeSeriesDataset 中的 past_values 仅提供 context_length 长度的历史数据，未为滞后特征预留额外的历史数据。
可能的模型实现问题：
- 错误信息 found lag 1 while history length is only 192 有些异常，因为滞后值 1 通常不应导致问题。这可能表明 Hugging Face 的 Informer 实现中对滞后序列的处理逻辑与预期不符，或者数据集的切分逻辑有误。

解决方案

为了解决 past_values 长度不足的问题，我们需要确保 past_values 的长度至少为 context_length + max(lags_sequence)。可以通过以下方式修复：

增加 context_length：
- 将 context_length 设置为足够大，以涵盖 max(lags_sequence)。例如，设置 context_length=168 + 24=192 或更大。
调整 lags_sequence：
- 确保 max(lags_sequence) 小于 context_length，以避免需要额外历史数据。
修改数据集：
- 在 TimeSeriesDataset 中，确保 past_values 包含足够的额外历史数据，以支持滞后特征的提取。

以下是修复后的完整代码，增加 context_length 并调整数据集逻辑以确保 past_values 长度满足 context_length + max(lags_sequence)。

2. distribution_output

你配置了 distribution_output=“student_t” 和 loss=“nll”，表示模型启用概率预测，输出的是一个分布（例如 Student-t 分布的均值、尺度等参数）。

错误分析

根据错误信息：

ValueError: Value is not broadcastable with batch_shape+event_shape: torch.Size([32, 24, 1]) vs torch.Size([32, 32, 24]).

错误发生在 InformerForPrediction 模型的 forward 方法中，具体是在计算损失函数（nll，负对数似然）时，future_values 的形状与模型输出的分布形状不匹配。错误提示表明：

模型期望的形状：future_values 的形状为 torch.Size([32, 24, 1])，即 (batch_size, prediction_length, input_size)，这是正确的数据形状。
模型输出的分布形状：torch.Size([32, 32, 24])，表明模型的输出分布形状多了一个维度（中间的 32），这与 future_values 的形状不兼容。

错误原因

概率预测输出的形状问题：
- 你配置了 distribution_output="student_t" 和 loss="nll"，表示模型启用概率预测，输出的是一个分布（例如 Student-t 分布的均值、尺度等参数）。
- Informer 的 forward 方法在概率模式下会生成一个分布对象，其 batch_shape 和 event_shape 组合后与目标 future_values 的形状不匹配。
- 错误中的 torch.Size([32, 32, 24]) 表明模型输出的分布可能在 batch_size 后多了一个维度（可能是由于 d_model 或注意力机制的输出未正确处理）。
可能的模型配置问题：
- InformerConfig 中的某些参数（例如 d_model、encoder_attention_heads 或 decoder_attention_heads）可能导致输出形状异常。
- 具体来说，d_model=64 和 encoder_attention_heads=2、decoder_attention_heads=2 可能导致内部张量维度计算错误，尤其是当 input_size=1 时。
数据集输入正确性：
- future_values 的形状 [32, 24, 1] 是正确的，符合 (batch_size, prediction_length, input_size)。
- 但是，模型内部的分布计算逻辑可能期望不同的 event_shape 或 batch_shape，导致广播错误。
Hugging Face Informer 实现细节：
- 根据 Hugging Face 文档（https://hugging-face.cn/docs/transformers/model_doc/informer），InformerForPrediction 在概率模式下会输出一个分布对象（例如 Student-t 分布），其参数形状需要与 future_values 匹配。
- 错误的额外维度 [32, 32, 24] 可能源于模型的解码器输出或分布构造逻辑，可能与 attention_heads 或 d_model 的配置有关。

错误本质

核心问题：模型输出的分布形状 [32, 32, 24] 与目标 future_values 的形状 [32, 24, 1] 不兼容，可能是因为：
- 模型的解码器输出未正确映射到期望的 (batch_size, prediction_length, input_size)。
- 概率分布的构造（student_t）在批次维度上引入了额外的维度（可能是 d_model 或 attention_heads 的影响）。

解决方案

为了解决形状不匹配问题，我们需要：

检查模型输出逻辑：
- 确保 InformerForPrediction 的输出分布形状与 future_values 匹配。
- 可能需要调整 d_model、attention_heads 或禁用概率预测（切换到点预测）以简化问题。
切换到点预测（临时解决）：
- 将 distribution_output 设置为 "normal" 或禁用概率预测（loss="mse"），以避免分布形状问题。
调整数据集或模型配置：
- 确保 past_values 和 future_values 的形状正确。
- 检查 InformerConfig 中的参数，确保 d_model 和 attention_heads 不会导致输出维度异常。
验证输入形状：
- 打印数据集样本的形状，确保输入符合模型要求。

以下是修复后的代码，暂时切换到点预测（loss="mse"）以规避概率分布的形状问题，并保留之前的 past_values 长度修复（context_length + max(lags_sequence)）。如果需要保留概率预测，可以在后续调试中调整。

3、loss和InformerForPrediction

是因为 Hugging Face 的 InformerForPrediction 并不支持 distribution_output=None。
在当前 Transformers 实现里，distribution_output 必须指定一个已知的概率分布类型，例如 “normal”, “student_t” 等。
是因为 Hugging Face 的 InformerForPrediction 并不支持 “mse” 或 “mae” 作为 loss。

分析：使用 `InformerModel` 是否能解决问题

当前的错误是：

ValueError: Value is not broadcastable with batch_shape+event_shape: torch.Size([32, 24, 1]) vs torch.Size([32, 32, 24]).

这个错误发生在 InformerForPrediction 的 forward 方法中，具体是概率预测模式（loss="nll", distribution_output="normal"）下，模型输出的分布参数形状 [32, 32, 24] 与目标 future_values 的形状 [32, 24, 1] 不匹配。问题可能源于解码器输出未正确映射到 input_size=1，导致额外的维度（32）。

你提出的问题是：使用 InformerModel 代替 InformerForPrediction 是否能解决问题？ 让我们分析一下这是否可行，以及如何实现。

`InformerModel` vs `InformerForPrediction`

根据 Hugging Face 文档（https://hugging-face.cn/docs/transformers/model_doc/informer）：

InformerModel：
- 是 Informer 的基础模型，仅输出编码器和解码器的表示（通常是隐藏状态），不包含用于预测的输出层（如分布参数或点预测）。
- 输出形状取决于 d_model 和序列长度，通常为 [batch_size, prediction_length, d_model]。
- 不直接支持损失计算（例如 nll 或 mse），需要用户自定义输出层和损失函数。
- 适用于需要自定义预测逻辑的场景，例如手动实现点预测或分布预测。
InformerForPrediction：
- 专门为时间序列预测设计，内置了预测头和损失计算逻辑（支持 nll 损失和概率分布输出，如 normal 或 student_t）。
- 输出包含分布参数（例如均值、方差）或点预测值，预期形状为 [batch_size, prediction_length, input_size]。
- 当前错误表明其概率预测输出的形状 [32, 32, 24] 不正确，可能与 d_model 或 attention_heads 配置有关。

使用 `InformerModel` 的潜在优势

避免概率分布问题：InformerModel 不直接输出分布参数，因此可以绕过 nll 损失的形状不匹配问题（[32, 32, 24] vs [32, 24, 1]）。
自定义输出层：可以添加一个简单的线性层，将解码器输出从 [batch_size, prediction_length, d_model] 映射到 [batch_size, prediction_length, input_size]，实现点预测并使用 MSE 损失。
灵活性：允许用户完全控制预测逻辑和损失函数，适合需要 MSE 损失的场景。

使用 `InformerModel` 的挑战

需要自定义模型：InformerModel 只提供编码器和解码器输出，需手动添加预测头（例如线性层）来生成 [batch_size, prediction_length, input_size] 的预测值。
损失函数实现：需要自定义 Trainer.compute_loss 或创建一个新模型类来计算 MSE 损失。
配置兼容性：确保 InformerConfig 参数（如 d_model, attention_heads）与自定义预测头兼容。

是否能解决问题

是，部分解决：使用 InformerModel 可以避免当前的形状不匹配问题，因为它不依赖内置的概率分布输出（nll 损失）。通过添加一个线性层并使用 MSE 损失，可以直接生成与 future_values 形状匹配的预测值（[32, 24, 1]）。
限制：需要额外代码实现预测头和损失计算，可能增加开发复杂度。此外，如果目标是保留概率预测（例如输出分布参数），InformerModel 仍需自定义分布逻辑，可能重新引入形状问题。

4、必要参数’past_time_features’ and ‘past_observed_mask’

TypeError: InformerModel.forward() missing 2 required positional arguments: ‘past_time_features’ and ‘past_observed_mask’

def __getitem__(self, idx):batch_size, seq_len, input_size = 1, self.X.shape[1], self.X.shape[2]past_time_features = torch.zeros(seq_len, 1, dtype=torch.float32)  # 占位past_observed_mask = torch.ones(seq_len, input_size, dtype=torch.bool)  # 全部观察到return {"past_values": self.X[idx],"future_values": self.y[idx],"past_time_features": past_time_features,"past_observed_mask": past_observed_mask,}# InformerModel 必须提供 past_time_features 和 past_observed_mask
outputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask)

5、滞后序列 lags_sequence

ValueError: lags cannot go further than history length, found lag 7 while history length is only 30

lags_sequence (list[int], 默认 [1,2,3,4,5,6,7])
输入时间序列的滞后步骤，定义用于构建滞后特征，建议根据数据频率调整。(hugging-face.cn)
past_values (torch.FloatTensor)
形状为 (batch_size, sequence_length) 或 (..., input_size)，包含模型用于预测的历史时间序列值。注意：sequence_length 必须 ≥ config.context_length + max(config.lags_sequence)（若滞后最大值为 7，默认为 context_length + 7）。(hugging-face.cn)

明白，你引用的文档里明确说明了两点：

滞后序列 lags_sequence
- 用于构建滞后特征，例如 [1,2,3,4,5,6,7] 表示取过去 1、2…7 个时间步的值作为特征。
- 建议根据数据频率调整，比如日线用 1~~7，小时线可能用 1~~24。
past_values 的长度要求 （数据切片）
数据集切片部分，在 StockDataset 或 TimeSeriesDataset 里获取历史数据时
- 传给模型的 past_values 必须至少包含 context_length + max(lags_sequence) 的长度。
- 举例：context_length=30，最大滞后 7 → sequence_length ≥ 37。
- 如果滞后最大值是 24，那么 sequence_length ≥ context_length + 24。

💡 对应修改方案：

保证数据切片 past_values 满足这个长度：

total_context_length = context_length + max(lags_sequence)
past_values = data[i : i + total_context_length]  # 必须 >= context_length + max(lags)

或调整 context_length 以适配你的滞后：

context_length = 30
lags_sequence = [1,2,3,4,5,6,7]  # 最大滞后 7
# 需要 past_values 长度 >= 30 + 7 = 37

如果你希望，我可以帮你直接改你上面那个 OHLCV多变量训练示例，让 lags_sequence、context_length 和 past_values 完全匹配，同时支持多步预测。你希望我帮你改吗？

👌 好的，我来把 past_values、lags_sequence 和 context_length 这三个参数彻底讲清楚，结合 Hugging Face 的 TimeSeriesTransformer 等时序模型。

1. 三个参数的作用

参数	含义	控制什么	关系
past_values	输入的历史序列值（FloatTensor）	你实际喂给模型的原始时间序列	必须足够长，保证包含 `context_length` 和最大滞后值
context_length	模型实际会用的连续历史长度	决定模型的“输入窗口大小”	从 `past_values` 里截取最后一段
lags_sequence	指定滞后步长	决定模型还需要哪些历史点（非连续特征）	必须保证这些索引在 `past_values` 范围内能取到

2. 输入 shape

past_values:
(batch_size, sequence_length) 或 (batch_size, sequence_length, input_size)
- 如果只有一个变量：input_size=1
- 如果有 OHLCV 多变量：input_size=5

要求：

sequence_length ≥ context_length + max(lags_sequence)

否则，模型在取滞后点时会超出范围。

3. 举例（单变量收盘价预测）

假设数据：

时间:   t-9   t-8   t-7   t-6   t-5   t-4   t-3   t-2   t-1   t
值:     95    96    97    98    99    100   101   102   103   ?

目标：预测 t 时刻的值。

(1) 设置

context_length = 4
lags_sequence = [1, 3, 7]

(2) past_values

要求长度至少：

context_length + max(lags_sequence) = 4 + 7 = 11

所以需要输入最近 11 个点：

[95, 96, 97, 98, 99, 100, 101, 102, 103]

(假设再多补几个才能满足 11 个)

(3) context_length

模型实际会用 最后 4 个点 作为连续上下文：

[100, 101, 102, 103]

(4) lags_sequence

模型还会额外从 past_values 中抽取滞后点：

lag=1 → t-1 → 103
lag=3 → t-3 → 101
lag=7 → t-7 → 97

得到：

[103, 101, 97]

4. 综合输入到模型

最终模型接收两个信息：

连续上下文（最后 context_length 个点）
```
past_context = [100, 101, 102, 103]
```
滞后特征（lags_sequence 指定的点）
```
lag_features = [103, 101, 97]
```

实际在 Hugging Face 内部，这些都是从 past_values 自动切片出来的。
你只要保证 past_values 足够长即可。

5. 多变量情况 (OHLCV)

如果 input_size=5，比如 open, high, low, close, volume：

past_values.shape = (batch_size, sequence_length, 5)
context_length=30 → 取最后 30 天的 5 维特征
lags_sequence=[7, 14, 21] → 取过去 7/14/21 天的 5 维特征

✅ 总结：

past_values：你喂进去的完整历史序列
context_length：模型截取的连续窗口（末尾一段）
lags_sequence：模型额外抽取的指定滞后点
条件：len(past_values) ≥ context_length + max(lags_sequence)

要不要我给你写一个 PyTorch 小demo，输入一段 past_values，然后自动切出 context 和 lags，这样你可以直接运行看看？

6.如何输入模型，及怎组成构造embedding

好问题 👍 我来帮你把 past_values 如何输入 Hugging Face 时序模型，以及 embedding 是怎么构造的，讲得很清楚，并给个 PyTorch 小 demo。

输入到模型的流程

在 Hugging Face TimeSeriesTransformer（或 DeepAR、Informer）里，输入张量是：

model_inputs = {"past_values": torch.FloatTensor,     # 历史时间序列"past_time_features": torch.FloatTensor,  # 时间特征（可选，比如 day-of-week）"past_observed_mask": torch.BoolTensor,   # 缺失值mask
}

past_values.shape = (batch_size, sequence_length, input_size)
- sequence_length 必须 ≥ context_length + max(lags_sequence)
- input_size = 1 (单变量) 或 >1 (多变量，OHLCV)

模型会 自动在 forward 里截取：

最后 context_length 个点
按 lags_sequence 索引出的点

Embedding 如何构造

在 TimeSeriesTransformer 里，embedding 一般分为三部分：

Value Embedding（必需）
- past_values 先通过一个 nn.Linear(input_size, d_model) 投影到 Transformer 维度。
- 类似 NLP 里的 token embedding。
Time Feature Embedding（可选）
- 比如 day-of-week, month-of-year，经过 nn.Linear 变成 d_model 维度。
Positional Embedding（必需）
- 标准 Transformer 位置编码（sinusoidal 或可学习）
- 保证模型区分序列的顺序。

最终三者相加，形成 Transformer 的输入：

embedding = value_emb + time_emb + positional_emb

6、输出维度（✅）

提供输入值，未来值只用future_time_features=future_time_features,不用future_values，返回为啥是一步[B, 1, hidden_size]

原因在于没有给解码器提供完整的目标序列（future_values），模型就只能“起步”一步，并且默认开启了缓存（use_cache=True），只返回这一步的隐藏状态。因此你看到的输出形状是

(batch_size, 1, hidden_size)

具体机制如下：

在调用 InformerModel.forward() 时，如果不传 future_values，解码器无法进行逐时步的 teacher forcing，只能从起始位置（相当于一个“空”令牌）启动一次自回归计算。
配置里 use_cache 默认是 True，意味着在没有显式传入 past_key_values 的情况下，模型仍然只计算并返回最新一步的隐藏状态，而不是整个序列的所有步长。

要想拿到完整的 prediction_length 步隐藏状态，你有两个选项：

在 forward() 里同时传入 future_values（长度为 prediction_length 的真实值），模型就会输出 (batch_size, prediction_length, hidden_size)；
或者使用 InformerForPrediction.generate()，它会在内部自回归地生成完整的未来序列，然后你可以从 outputs.sequences 或者多次调用 forward()（携带 past_key_values）中取出每一步。

如果你希望我给出对应的示例代码，或者进一步讲解 cache 如何与自回归推理结合，请告诉我！

如何获得多步输出
补充占位的 future_values 即使不想用真实标签，也可以传入形状为 (batch_size, prediction_length) 的零张量：

dummy_future = torch.zeros(batch_size, config.prediction_length)
outputs = model(past_values=batch["past_values"],past_time_features=batch["past_time_features"],past_observed_mask=batch["past_observed_mask"],future_time_features=batch["future_time_features"],future_values=dummy_future
)
# 这时 outputs.last_hidden_state 的 shape 就是 (B, prediction_length, hidden_size)

5、没Trainer版本

好的✅ 我来帮你把这个脚本改写成一个没有 Huggingface Trainer 的纯 PyTorch 训练 & 评估版本。
这样更直观，也方便你自己加功能。

改写后的完整代码（去掉 `Trainer` 和 `TrainingArguments`）：

import numpy as np
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import InformerConfig, InformerModel
import torch.nn as nn
import torch.optim as optim
import evaluate# ========== 1. 构造假股票数据 ==========
num_days = 600
columns = ["open", "high", "low", "close", "volume"]
data = {col: np.random.rand(num_days) * 100 for col in columns[:4]}
data["volume"] = np.random.randint(1000, 5000, size=num_days)
df = pd.DataFrame(data)seq_len = 30  # 输入序列长度
pred_len = 5  # 预测长度
input_size = len(columns)  # OHLCV 5个特征def create_samples(df, seq_len, pred_len):X, y = [], []for i in range(len(df) - seq_len - pred_len):X.append(df.iloc[i : i + seq_len].values)y.append(df.iloc[i + seq_len : i + seq_len + pred_len].values)return np.array(X), np.array(y)X, y = create_samples(df, seq_len, pred_len)# ========== 2. 定义 Dataset ==========
class StockDataset(Dataset):def __init__(self, X, y):self.X = torch.tensor(X, dtype=torch.float32)  # [num_samples, seq_len, input_size]self.y = torch.tensor(y, dtype=torch.float32)  # [num_samples, pred_len, input_size]def __len__(self):return len(self.X)def __getitem__(self, idx):seq_len, input_size = self.X.shape[1], self.X.shape[2]past_time_features = torch.zeros(seq_len, 1, dtype=torch.float32)  # 占位past_observed_mask = torch.ones(seq_len, input_size, dtype=torch.bool)  # 全部观察到return {"past_values": self.X[idx],"labels": self.y[idx],"past_time_features": past_time_features,"past_observed_mask": past_observed_mask,}train_size = int(len(X) * 0.8)
train_ds = StockDataset(X[:train_size], y[:train_size])
test_ds = StockDataset(X[train_size:], y[train_size:])train_loader = DataLoader(train_ds, batch_size=32, shuffle=True)
test_loader = DataLoader(test_ds, batch_size=32)# ========== 3. 定义自定义模型 ==========
class CustomInformer(nn.Module):def __init__(self, config):super().__init__()self.informer = InformerModel(config)# 预测头，将 decoder 输出映射到 pred_len * input_sizeself.prediction_head = nn.Linear(config.hidden_size, config.prediction_length * config.input_size)self.config = configdef forward(self, past_values, past_time_features, past_observed_mask, labels=None):outputs = self.informer(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask)decoder_output = outputs.last_hidden_state  # [batch, 1, hidden_size]preds = self.prediction_head(decoder_output)  # [batch, 1, pred_len*input_size]preds = preds.view(decoder_output.size(0), self.config.prediction_length, self.config.input_size)loss = Noneif labels is not None:loss = nn.MSELoss()(preds, labels)return {"loss": loss, "logits": preds} if loss is not None else {"logits": preds}# ========== 4. 定义模型配置 ==========
lags_sequence = [1,2,3,4,5,6,7]  # 最大滞后 7 
config = InformerConfig(input_size=input_size,hidden_size=64,prediction_length=pred_len,lags_sequence=lags_sequence,context_length=seq_len - max(lags_sequence),num_attention_heads=4,num_hidden_layers=2,attention_type="prob",num_time_features=1
)device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CustomInformer(config).to(device)# ========== 5. 优化器 ==========
optimizer = optim.Adam(model.parameters(), lr=1e-3)# ========== 6. 训练函数 ==========
def train_one_epoch(model, loader, optimizer):model.train()total_loss = 0for batch in loader:past_values = batch["past_values"].to(device)labels = batch["labels"].to(device)past_time_features = batch["past_time_features"].to(device)past_observed_mask = batch["past_observed_mask"].to(device)optimizer.zero_grad()output = model(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,labels=labels)loss = output["loss"]loss.backward()optimizer.step()total_loss += loss.item()return total_loss / len(loader)# ========== 7. 评估函数 ==========
mse_metric = evaluate.load("mse")def evaluate_model(model, loader):model.eval()all_preds, all_labels = [], []with torch.no_grad():for batch in loader:past_values = batch["past_values"].to(device)labels = batch["labels"].to(device)past_time_features = batch["past_time_features"].to(device)past_observed_mask = batch["past_observed_mask"].to(device)output = model(past_values=past_values,past_time_features=past_time_features,past_observed_mask=past_observed_mask,)preds = output["logits"]all_preds.append(preds.cpu())all_labels.append(labels.cpu())all_preds = torch.cat(all_preds, dim=0).numpy()all_labels = torch.cat(all_labels, dim=0).numpy()mse = mse_metric.compute(predictions=all_preds.flatten(), references=all_labels.flatten())return mse# ========== 8. 主训练循环 ==========
num_epochs = 6
for epoch in range(num_epochs):train_loss = train_one_epoch(model, train_loader, optimizer)eval_mse = evaluate_model(model, test_loader)print(f"Epoch {epoch+1}/{num_epochs} | Train Loss: {train_loss:.4f} | Eval MSE: {eval_mse}")

🔑 改写重点：

去掉 Trainer 和 TrainingArguments，改用 DataLoader + 手写循环。
train_one_epoch() 和 evaluate_model() 明确了 forward 输入和 loss 计算。
evaluate_model 里累积预测 & 标签，然后用 evaluate 计算 MSE。
输出每个 epoch 的 训练损失 和 验证集 MSE。

要不要我再帮你加上 多指标评估（MSE、MAE、RMSE、R²），这样你就能一次性看全了？

查看全文

http://www.dtcms.com/a/345181.html

SPI的DMA方式

线性回归：从原理到实战的完整指南

ROS中的自定义消息

Windows 11 安装 Miniforge，配置国内源

基层医疗遇到了什么问题？

【spring security】权限管理组件执行流程详解

centos7安装oracle19c流程（自用）

Highcharts 推出适用于 Svelte 的官方集成库

【软考架构】关系数据库

无人机电机与螺旋桨的匹配原理及方法（一）

随机森林--集成学习

华为网路设备学习-29（BGP协议四）路由策略-实验

虚拟线程（高版本JDK)

在 SymPy 中代入抽象函数的数值和导数值

JSP入门详解

Leetcode—1148. 文章浏览 I【简单】

AI重构消费决策：从AI试衣间降退货率到预判式复购，购物效率提升新逻辑

FROM Buggy_Code SELECT Liability_Causes # 民法典之故障代码责任溯源

Prompt工程完全指南：从基础结构到AI时代的上下文革命

意识上传与智能增强：脑机接口与AI共塑的人类未来图景

如何用批量钱包实现链上“身份伪装”？

PADS Logic软件及相关工具

s3cmd使用方法

常见整流电路总结

当我们想用GPU（nlp模型篇）

MySQL诊断系列（6/6）：系统监控——实时掌握数据库“生命体征”

【jar包启动，每天生成一个日志文件】

本地 Graph-RAG（图 + RAG）部署与使用落地方案

Unreal Engine AActor

机器学习--线性回归

Informer参数代码

0、代码教程

1、Huggingface Informer参数

📌 表 1：InformerConfig — 配置参数汇总

📌 表 2：InformerModel / InformerForPrediction — 前向传播输入

1. InformerConfig 类 — 必要参数说明

2. 输入参数：InformerModel 与 InformerForPrediction 的前向传播参数

总结一览

Informer模型参数汇总表

一、InformerConfig 配置参数汇总

二、InformerModel 前向传播参数汇总

必需输入参数

可选输入参数

输出返回值(✅)

三、InformerForPrediction 特有参数

训练阶段额外参数

推理阶段生成方法

四、完整使用示例

1. 配置初始化

2. InformerModel 使用

3. InformerForPrediction 使用

五、关键注意事项

2、股票预测（简单）

📘 Informer 模型主要参数说明

📊 股票预测示例：使用 Hugging Face 的 Trainer

🧠 代码说明

3、 informer股票示例（修正）

5.1 纠正前

5.2 修正后代码 (单变量)

5.3 多变量输入

4、 报错说明

0. Trainer compute_metrics()中eval_pred的 logits, labels对应关系

1.对应关系

2. 测试评估是否生效

3. trainer metric（labels格式）

1.past_values 长度

错误分析

解决方案

2. distribution_output

错误分析

错误原因

错误本质

解决方案

3、loss和InformerForPrediction

分析：使用 InformerModel 是否能解决问题

InformerModel vs InformerForPrediction

使用 InformerModel 的潜在优势

使用 InformerModel 的挑战

是否能解决问题

4、必要参数’past_time_features’ and ‘past_observed_mask’

5、滞后序列 lags_sequence

1. 三个参数的作用

2. 输入 shape

3. 举例（单变量收盘价预测）

4. 综合输入到模型

5. 多变量情况 (OHLCV)

6.如何输入模型，及怎组成构造embedding

6、输出维度（✅）

5、没Trainer版本

改写后的完整代码（去掉 Trainer 和 TrainingArguments）：

🔑 改写重点：

相关文章：

📌 表 1：`InformerConfig` — 配置参数汇总

📌 表 2：`InformerModel` / `InformerForPrediction` — 前向传播输入

1. `InformerConfig` 类 — 必要参数说明

2. 输入参数：`InformerModel` 与 `InformerForPrediction` 的前向传播参数

4、报错说明

分析：使用 `InformerModel` 是否能解决问题

`InformerModel` vs `InformerForPrediction`

使用 `InformerModel` 的潜在优势

使用 `InformerModel` 的挑战

改写后的完整代码（去掉 `Trainer` 和 `TrainingArguments`）：