当前位置：首页 > news >正文

Python 深度学习第5章机器学习的核心问题泛化及如何提高模型的泛化能力实例

news 2025/10/14 8:11:46

Python 深度学习第5章机器学习的核心问题泛化及如何提高模型的泛化能力实例

内容概要

第5章深入探讨了机器学习的核心问题：泛化。本章通过理论和实践相结合的方式，详细讲解了如何评估模型的泛化能力，以及如何通过各种方法提高模型的泛化性能。通过本章，读者将理解优化与泛化之间的平衡，并掌握改善模型拟合和泛化的最佳实践。
在这里插入图片描述

主要内容

泛化：机器学习的目标
- 泛化是指模型在未见过的数据上的表现。
- 优化是指模型在训练数据上的表现。
- 过拟合是机器学习中的普遍问题，发生在模型过于适应训练数据而无法泛化到新数据时。
欠拟合与过拟合
- 欠拟合：模型在训练数据上的表现不佳，通常是因为模型过于简单。
- 过拟合：模型在训练数据上的表现很好，但在验证数据上的表现开始下降。
- 过拟合的原因包括数据噪声、特征稀疏性、特征相关性等。
深度学习中的泛化本质
- 深度学习模型通过学习数据的潜在流形（manifold）来实现泛化。
- 流形假设：自然数据通常位于高维空间中的低维流形上。
- 泛化的关键在于模型能够通过插值在流形上进行预测。
模型评估方法
- 训练集、验证集和测试集的划分。
- 简单保留验证法、K折交叉验证和迭代K折交叉验证。
- 使用常识基线来评估模型性能。
改善模型拟合
- 调整梯度下降参数（学习率、批量大小）。
- 使用更好的架构先验。
- 增加模型容量。
提高泛化能力
- 数据集管理：确保数据质量、特征选择和特征工程。
- 早期停止（Early Stopping）。
- 模型正则化：减少网络规模、权重正则化（L1和L2）、Dropout。

关键代码和算法

5.2.1 简单保留验证法

num_validation_samples = 10000
np.random.shuffle(data)
validation_data = data[:num_validation_samples]
training_data = data[num_validation_samples:]
model = get_model()
model.fit(training_data, ...)
validation_score = model.evaluate(validation_data, ...)

5.2.2 K折交叉验证

k = 3
num_validation_samples = len(data) // k
np.random.shuffle(data)
validation_scores = []
for fold in range(k):
    validation_data = data[num_validation_samples * fold: num_validation_samples * (fold + 1)]
    training_data = np.concatenate([data[:num_validation_samples * fold], data[num_validation_samples * (fold + 1):]])
    model = get_model()
    model.fit(training_data, ...)
    validation_score = model.evaluate(validation_data, ...)
    validation_scores.append(validation_score)
validation_score = np.average(validation_scores)

5.3.1 调整学习率

model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])
model.compile(optimizer=keras.optimizers.RMSprop(1e-2),
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
model.fit(train_images, train_labels,
          epochs=10,
          batch_size=128,
          validation_split=0.2)

5.4.3 早期停止

from tensorflow.keras.callbacks import EarlyStopping

model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
early_stopping = EarlyStopping(monitor="val_loss", patience=2)
history = model.fit(train_data, train_labels,
                    epochs=20,
                    batch_size=512,
                    validation_split=0.4,
                    callbacks=[early_stopping])

5.4.4 权重正则化

from tensorflow.keras import regularizers

model = keras.Sequential([
    layers.Dense(16,
                 kernel_regularizer=regularizers.l2(0.002),
                 activation="relu"),
    layers.Dense(16,
                 kernel_regularizer=regularizers.l2(0.002),
                 activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_l2_reg = model.fit(train_data, train_labels,
                           epochs=20,
                           batch_size=512,
                           validation_split=0.4)

5.4.4 Dropout

model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(16, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_dropout = model.fit(train_data, train_labels,
                            epochs=20,
                            batch_size=512,
                            validation_split=0.4)

精彩语录

中文：机器学习模型的目标是泛化，即在从未见过的数据上表现良好。
英文原文：The purpose of a machine learning model is to generalize: to perform accurately on never-before-seen inputs.
解释：这句话强调了机器学习的最终目标，即模型在新数据上的表现。
中文：深度学习模型通过学习数据的潜在流形来实现泛化。
英文原文：A deep neural network achieves generalization by learning a parametric model that can successfully interpolate between training samples.
解释：这句话解释了深度学习模型泛化的本质，即通过插值在流形上进行预测。
中文：机器学习的核心问题是优化与泛化之间的张力。
英文原文：The fundamental problem in machine learning is the tension between optimization and generalization.
解释：这句话总结了机器学习中的核心挑战，即如何在优化和泛化之间找到平衡。
中文：特征工程是使问题更简单的关键。
英文原文：The essence of feature engineering is making a problem easier by expressing it in a simpler way.
解释：这句话强调了特征工程的重要性，即通过更好的特征表示来简化问题。
中文：Dropout是一种有效的正则化技术，通过随机丢弃神经元来减少过拟合。
英文原文：Dropout is one of the most effective and most commonly used regularization techniques for neural networks.
解释：这句话介绍了Dropout的核心思想，即通过引入噪声来防止模型过拟合。