当前位置：首页 > news >正文

机器学习实操第二部分神经网路和深度学习第11章训练深度神经网络

news 2025/10/31 20:42:54

机器学习实操第二部分神经网路和深度学习第11章训练深度神经网络

内容概要

第11章深入探讨了训练深度神经网络（DNN）的技术和挑战。深度神经网络因其强大的表征学习能力而广泛应用于复杂问题，但训练这些网络并非易事。本章详细讨论了梯度消失和爆炸问题、权重初始化、激活函数选择、批量归一化、预训练层的复用、优化器选择和正则化技术。通过这些内容，读者将掌握如何有效地训练深度神经网络。
在这里插入图片描述

主要内容

梯度消失和爆炸问题
- 问题描述：梯度在反向传播过程中可能逐渐变小或变大，导致训练困难。
- 解决方案：使用Glorot初始化、He初始化等权重初始化策略，以及选择合适的激活函数（如ReLU及其变体）。
权重初始化
- Glorot初始化：通过调整权重的方差来平衡前向和反向传播的信号。
- He初始化：适用于ReLU激活函数，通过调整权重的方差来避免梯度消失。
激活函数
- ReLU及其变体：如Leaky ReLU、ELU、SELU等，用于引入非线性并缓解梯度消失问题。
- GELU、Swish、Mish：新型激活函数，通常在复杂任务中表现更好。
批量归一化
- 原理：通过标准化层输入并重新缩放和偏移来稳定训练过程。
- 实现：在Keras中使用BatchNormalization层。
预训练层的复用
- 迁移学习：复用预训练模型的底层特征提取器，减少训练数据和计算量。
- 无监督预训练：使用自编码器或生成对抗网络（GAN）进行预训练，再用于目标任务。
优化器
- Momentum：通过累积历史梯度加速收敛。
- Nesterov加速梯度：在动量优化的基础上提前计算梯度。
- AdaGrad、RMSProp、Adam：自适应学习率优化器，适用于不同场景。
正则化技术
- L1和L2正则化：通过约束权重的大小来减少过拟合。
- Dropout：随机丢弃神经元，增加模型的鲁棒性。
- 批量归一化：兼具正则化效果，减少过拟合。
学习率调度
- 功率衰减、指数衰减、分段常数衰减：动态调整学习率以加速收敛。
- 1cycle调度：在训练过程中调整学习率和动量，提高训练效率。

关键代码和算法

11.1 使用Keras实现批量归一化

from tensorflow.keras import layers, Sequential# 构建模型
model = Sequential([layers.Flatten(input_shape=[28, 28]),layers.BatchNormalization(),layers.Dense(300, activation="relu", kernel_initializer="he_normal"),layers.BatchNormalization(),layers.Dense(100, activation="relu", kernel_initializer="he_normal"),layers.BatchNormalization(),layers.Dense(10, activation="softmax")
])# 编译模型
model.compile(loss="sparse_categorical_crossentropy",optimizer="sgd",metrics=["accuracy"])# 训练模型
history = model.fit(X_train, y_train, epochs=30,validation_data=(X_valid, y_valid))

11.2 使用Keras实现迁移学习

# 加载预训练模型
model_A = tf.keras.models.load_model("my_model_A")# 创建新模型并复用预训练层
model_B_on_A = tf.keras.Sequential(model_A.layers[:-1])
model_B_on_A.add(tf.keras.layers.Dense(1, activation="sigmoid"))# 冻结预训练层
for layer in model_B_on_A.layers[:-1]:layer.trainable = False# 编译模型
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
model_B_on_A.compile(loss="binary_crossentropy",optimizer=optimizer,metrics=["accuracy"])# 训练模型
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=4,validation_data=(X_valid_B, y_valid_B))# 解冻预训练层并继续训练
for layer in model_B_on_A.layers[:-1]:layer.trainable = Trueoptimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
model_B_on_A.compile(loss="binary_crossentropy",optimizer=optimizer,metrics=["accuracy"])history = model_B_on_A.fit(X_train_B, y_train_B, epochs=16,validation_data=(X_valid_B, y_valid_B))

精彩语录

中文：训练深度神经网络时，梯度消失和爆炸问题是主要挑战之一。
英文原文：Training a deep neural network isn’t a walk in the park. Here are some of the problems you could run into: gradients often get smaller and smaller as the algorithm progresses down to the lower layers.
解释：指出了深度网络训练中的主要挑战。
中文：批量归一化通过标准化层输入并重新缩放和偏移来稳定训练过程。
英文原文：Batch normalization consists of adding an operation in the model just before or after the activation function of each hidden layer. This operation simply zero-centers and normalizes each input, then scales and shifts the result using two new parameter vectors per layer.
解释：介绍了批量归一化的原理和作用。
中文：迁移学习通过复用预训练模型的底层特征提取器，减少训练数据和计算量。
英文原文：Transfer learning is generally not a good idea to train a very large DNN from scratch without first trying to find an existing neural network that accomplishes a similar task.
解释：强调了迁移学习的优势。
中文：Adam优化器结合了动量优化和RMSProp的优点，适用于大多数深度学习任务。
英文原文：Adam combines the ideas of momentum optimization and RMSProp: just like momentum optimization, it keeps track of an exponentially decaying average of past gradients; and just like RMSProp, it keeps track of an exponentially decaying average of past squared gradients.
解释：介绍了Adam优化器的核心思想。
中文：Dropout通过随机丢弃神经元来增加模型的鲁棒性和泛化能力。
英文原文：Dropout is one of the most popular regularization techniques for deep neural networks. It was proposed in a paper by Geoffrey Hinton et al. in 2012 and further detailed in a 2014 paper by Nitish Srivastava et al.
解释：描述了Dropout的基本原理和优势。