当前位置：首页 > news >正文

深度学习算法

news 2025/10/7 13:31:20

名词解释

神经网络（Neural Network, NN）：模拟人脑神经元结构的模型，由输入层、隐藏层、输出层构成，每层包含多个神经元，神经元通过权重连接，通过激活函数引入非线性，实现复杂映射。
人工神经元（Artificial Neuron）：神经网络的基本单元：接收输入信号（特征或前层输出），加权求和后加偏置，再通过激活函数输出结果（如 y = σ(w1x1 + w2x2 + b)，σ 为激活函数）。
输入层（Input Layer）：神经网络的第一层，直接接收原始特征数据，神经元数量等于特征维度（如用 “年龄、收入、消费频次”3 个特征，输入层则有 3 个神经元）。
隐藏层（Hidden Layer）：输入层与输出层之间的层，用于提取数据的抽象特征，层数越多（即 “深度” 越大），模型越能学习复杂规律（如 CNN 的卷积层、Transformer 的注意力层均为隐藏层）。
输出层（Output Layer）：神经网络的最后一层，输出模型预测结果：分类任务用 softmax 激活（输出类别概率），回归任务用线性激活（输出连续值），二分类任务用 sigmoid 激活（输出正类概率）。
激活函数（Activation Function）：为神经网络引入非线性的函数，使模型能拟合非线性数据，常用激活函数：sigmoid（二分类输出）、tanh（隐藏层）、ReLU（隐藏层，缓解梯度消失）、softmax（多分类输出）。
ReLU（Rectified Linear Unit）：最常用的激活函数，公式为 f (x) = max (0, x)，优点是计算快、缓解梯度消失（x>0 时梯度为 1），缺点是部分神经元可能 “死亡”（x≤0 时梯度为 0，参数不再更新）。
Softmax 函数：多分类任务的输出层激活函数，将模型输出的 “logits”（未归一化的分数）映射为 [0,1] 区间的概率，且所有类别概率和为 1（如输出 [0.1, 0.8, 0.1] 表示样本属于第 2 类的概率为 80%）。
反向传播（Backpropagation, BP）：神经网络训练的核心算法：通过 “链式法则” 从输出层反向计算各层参数的梯度，再用梯度下降更新参数，实现损失函数的最小化。
梯度消失（Vanishing Gradient）：深层神经网络训练中的问题：梯度通过反向传播时，经过多层权重相乘后逐渐趋近于 0，导致浅层参数无法更新（如用 sigmoid 激活函数时，梯度易消失），ReLU 可缓解此问题。
梯度爆炸（Exploding Gradient）：深层神经网络训练中的问题：梯度通过反向传播时，经过多层权重相乘后急剧增大，导致参数更新幅度过大，模型不稳定（可通过梯度裁剪缓解）。Transformer ：模型基于 “自注意力机制” 的序列模型，无需循环结构即可捕捉序列中任意位置的依赖关系，并行计算效率高，是 NLP 领域的革命性模型（如 BERT、GPT 均基于 Transformer）。
自注意力机制（Self-Attention Mechanism）：Transformer 的核心机制：通过计算 “查询（Q）、键（K）、值（V）” 的相似度，为序列中每个元素分配不同的注意力权重，突出重要元素的影响（如处理句子 “猫追老鼠” 时，“追” 会更关注 “猫” 和 “老鼠”）。
多头注意力（Multi-Head Attention）：自注意力的扩展：将 Q、K、V 投影到多个子空间，并行计算多个注意力头，再拼接结果，能捕捉序列中不同类型的依赖关系（如语义依赖、语法依赖）。
编码器 - 解码器（Encoder-Decoder）：结构处理 “输入序列→输出序列” 任务的模型框架：编码器将输入序列编码为 “上下文向量”，解码器基于上下文向量生成输出序列，典型应用如机器翻译（输入英文→编码器→解码器→输出中文）。迁移学习（Transfer Learning）：将从 “源任务”（如 ImageNet 图像分类）中学到的知识迁移到 “目标任务”（如猫咪品种分类）的方法，减少目标任务的标注数据需求，加速模型训练（如用预训练的 ResNet 微调猫咪分类模型）。
预训练模型（Pre-trained Model）：在大规模数据上提前训练好的模型，可作为 “基础模型” 在下游任务中微调使用（如 BERT、GPT、ResNet 均为预训练模型），避免从零开始训练，提升效率和性能。
微调（Fine-Tuning）：迁移学习的常用方法：将预训练模型的参数作为初始值，用目标任务的少量数据继续训练，调整部分或全部参数以适配目标任务（如用 BERT 微调情感分析模型时，仅训练最后一层全连接层）。变分自编码器（Variational Autoencoder, VAE）：基于自编码器的生成式模型：通过 “编码器将数据映射到 latent 空间的概率分布”，再用解码器从分布中采样并重建数据，能生成新的、多样化的数据（如 VAE 生成动漫头像）。
自编码器（Autoencoder, AE）：无监督学习模型：由编码器（将输入压缩为低维 latent 向量）和解码器（将 latent 向量重建为原始输入）构成，核心是最小化 “重建误差”，常用于降维、特征提取、异常检测。神经网络（尤其是 LSTM、RNN）：用 MinMaxScaler，对特征和目标都归一化
线性模型 / PCA / 聚类：用 StandardScaler，只对特征标准化，不要对目标标准化（回归问题）
永远不要在测试集上 fit —— 会造成数据泄露！

全连接神经网络（Fully Connected Neural Network, FCNN）

又称 “多层感知机（Multi-Layer Perceptron, MLP）”，由输入层、隐藏层、输出层构成，层间神经元 “全连接”（每个神经元与下一层所有神经元相连）；
每个连接有 “权重（Weight）”，神经元输出通过 “激活函数（如 Sigmoid、ReLU）” 引入非线性，使模型能拟合复杂非线性关系；
训练通过 “反向传播（Backpropagation）” 计算损失函数对权重的梯度，用梯度下降更新参数，最小化预测误差。

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_reportdef train_mlp_classifier(X, y, hidden_layer_sizes=(100,), max_iter=300, random_state=42):"""训练一个全连接神经网络（多层感知机）分类器参数:- X: 特征数据- y: 标签数据- hidden_layer_sizes: 隐藏层结构，默认(100,)表示1层100个神经元- max_iter: 最大迭代次数- random_state: 随机种子返回:- model: 训练好的模型- accuracy: 测试集准确率- report: 分类报告"""# 1. 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)# 2. 特征标准化scaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test)# 3. 定义并训练 MLP 模型model = MLPClassifier(hidden_layer_sizes=hidden_layer_sizes,max_iter=max_iter,random_state=random_state,verbose=False)model.fit(X_train_scaled, y_train)# 4. 预测与评估y_pred = model.predict(X_test_scaled)accuracy = accuracy_score(y_test, y_pred)report = classification_report(y_test, y_pred)return model, accuracy, report# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn 内置手写数字数据集digits = load_digits()X, y = digits.data, digits.target# 调用函数训练模型mlp_model, acc, report = train_mlp_classifier(X, y,hidden_layer_sizes=(128, 64),  # 两层隐藏层：128 -> 64max_iter=500,random_state=42)# 输出结果print(f"测试集准确率: {acc:.4f}")print("分类报告:")print(report)

循环神经网络（Recurrent Neural Network, RNN）

循环神经网络（Recurrent Neural Network, RNN）：处理序列数据（如文本、时间序列）的神经网络，通过 “隐藏状态” 保存历史信息，实现对序列上下文的建模（如用 RNN 处理句子时，每个词的预测会依赖前几个词），但存在长期依赖问题。

import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
import mathdef create_sequences(data, target, time_steps=3):"""将数据转换为时序输入格式 [样本数, 时间步, 特征数]"""X, y = [], []for i in range(len(data) - time_steps):X.append(data[i:i + time_steps, :])  # 前 time_steps 个时间步的特征y.append(target[i + time_steps])  # 下一个时间步的目标return np.array(X), np.array(y)def train_rnn_model(X, y, time_steps=3, units=50, epochs=50, test_size=0.2, random_state=42):"""训练循环神经网络（SimpleRNN）模型参数:- X: 特征数据 (样本数, 特征数)- y: 目标数据 (样本数,)- time_steps: 时间步长- units: RNN 隐藏层单元数- epochs: 训练轮数- test_size: 测试集比例- random_state: 随机种子返回:- model: 训练好的 RNN 模型- metrics: 包含多个评估指标的字典"""# 数据归一化scaler_X = MinMaxScaler()scaler_y = MinMaxScaler()X_scaled = scaler_X.fit_transform(X)y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).flatten()# 创建时序数据X_seq, y_seq = create_sequences(X_scaled, y_scaled, time_steps)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=test_size, random_state=random_state, shuffle=False)# 构建 RNN 模型model = Sequential()model.add(SimpleRNN(units, input_shape=(time_steps, X.shape[1])))model.add(Dense(1))model.compile(optimizer='adam', loss='mse')# 训练模型model.fit(X_train, y_train, epochs=epochs, batch_size=32, verbose=0)# 预测y_pred = model.predict(X_test)# 反归一化y_test_inv = scaler_y.inverse_transform(y_test.reshape(-1, 1)).flatten()y_pred_inv = scaler_y.inverse_transform(y_pred).flatten()# 计算多个评估指标mse = mean_squared_error(y_test_inv, y_pred_inv)rmse = math.sqrt(mse)mae = mean_absolute_error(y_test_inv, y_pred_inv)r2 = r2_score(y_test_inv, y_pred_inv)metrics = {'MSE': mse,'RMSE': rmse,'MAE': mae,'R2': r2}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载糖尿病数据集diabetes = load_diabetes()X, y = diabetes.data, diabetes.target# 训练 RNN 模型rnn_model, metrics = train_rnn_model(X, y,time_steps=5,units=64,epochs=100)# 打印所有评估指标print("模型评估指标:")for name, value in metrics.items():print(f"{name}: {value:.4f}")

长短期记忆网络（Long Short-Term Memory, LSTM）

长短期记忆网络（Long Short-Term Memory, LSTM）：解决 RNN 长期依赖问题的改进模型：通过 “门控机制”（输入门、遗忘门、输出门）控制信息的输入、遗忘和输出，能长期保存有用信息（如用 LSTM 处理长文本生成任务）。
门控循环单元（Gated Recurrent Unit, GRU）：LSTM 的简化版本，合并了遗忘门和输入门为 “更新门”，减少参数数量，训练速度更快，在许多序列任务中性能接近 LSTM（如语音识别、文本分类）。

import numpy as np
import math
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Densedef create_sequences(data, target, time_steps=3):"""将数据转换为时序输入格式 [样本数, 时间步, 特征数]"""X, y = [], []for i in range(len(data) - time_steps):X.append(data[i:i+time_steps, :])  # 前 time_steps 个时间步的特征y.append(target[i+time_steps])     # 下一个时间步的目标return np.array(X), np.array(y)def train_lstm_model(X, y, time_steps=3, units=50, epochs=150, test_size=0.2, random_state=42):"""训练 LSTM 模型（回归任务）参数:- X: 特征数据 (样本数, 特征数)- y: 目标数据 (样本数,)- time_steps: 时间步长- units: LSTM 隐藏层单元数- epochs: 训练轮数- test_size: 测试集比例- random_state: 随机种子返回:- model: 训练好的 LSTM 模型- metrics: 评估指标字典 (MSE, RMSE, MAE, R2)"""# 数据归一化scaler_X = MinMaxScaler()scaler_y = MinMaxScaler()X_scaled = scaler_X.fit_transform(X)y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).flatten()# 创建时序数据X_seq, y_seq = create_sequences(X_scaled, y_scaled, time_steps)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=test_size, random_state=random_state, shuffle=False)# 构建 LSTM 模型model = Sequential()model.add(LSTM(units, input_shape=(time_steps, X.shape[1])))model.add(Dense(1))model.compile(optimizer='adam', loss='mse')# 训练模型model.fit(X_train, y_train, epochs=epochs, batch_size=32, verbose=0)# 预测y_pred = model.predict(X_test)# 反归一化y_test_inv = scaler_y.inverse_transform(y_test.reshape(-1, 1)).flatten()y_pred_inv = scaler_y.inverse_transform(y_pred).flatten()# 计算评估指标mse = mean_squared_error(y_test_inv, y_pred_inv)rmse = math.sqrt(mse)mae = mean_absolute_error(y_test_inv, y_pred_inv)r2 = r2_score(y_test_inv, y_pred_inv)metrics = {'MSE': mse,'RMSE': rmse,'MAE': mae,'R2': r2}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn 内置的糖尿病数据集diabetes = load_diabetes()X, y = diabetes.data, diabetes.target# 调用 LSTM 训练函数lstm_model, metrics = train_lstm_model(X, y,time_steps=5,units=64,epochs=100)# 打印评估指标print("模型评估指标:")for name, value in metrics.items():print(f"{name}: {value:.4f}")

卷积神经网络（Convolutional Neural Network, CNN）

卷积层（Convolutional Layer）：CNN 的核心层：用 “卷积核”（如 3×3 的矩阵）在输入特征图上滑动，通过元素相乘求和提取局部特征（如用 “边缘检测卷积核” 提取图像中的边缘），多个卷积核可提取不同特征。
池化层（Pooling Layer）：CNN 中的降维层：通过 “窗口滑动” 对局部区域进行聚合操作（如最大池化取窗口内最大值，平均池化取平均值），减少参数数量和计算量，增强模型对微小位移的鲁棒性。
全连接层（Fully Connected Layer）：神经网络中 “每层神经元与前层所有神经元连接” 的层，将前层提取的特征映射为最终预测结果（如 CNN 的最后几层常用全连接层，输出类别概率）。

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropoutdef train_cnn_model(X, y, img_rows=8, img_cols=8, channels=1,filters=32, kernel_size=(3,3), pool_size=(2,2),dense_units=128, dropout_rate=0.5,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练卷积神经网络（CNN）模型用于图像分类参数:- X: 特征数据 (样本数, 特征数)- y: 标签数据 (样本数,)- img_rows, img_cols: 图像尺寸- channels: 图像通道数（灰度图=1，彩色图=3）- filters: 卷积核数量- kernel_size: 卷积核大小- pool_size: 池化窗口大小- dense_units: 全连接层单元数- dropout_rate: Dropout比例- epochs: 训练轮数- batch_size: 批次大小- test_size: 测试集比例- random_state: 随机种子返回:- model: 训练好的CNN模型- metrics: 评估指标字典"""# 数据归一化scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X)# 重塑为CNN输入格式 [样本数, 行数, 列数, 通道数]X_reshaped = X_scaled.reshape(X.shape[0], img_rows, img_cols, channels)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y, test_size=test_size, random_state=random_state, stratify=y)# 构建CNN模型model = Sequential([Conv2D(filters, kernel_size, activation='relu', input_shape=(img_rows, img_cols, channels)),MaxPooling2D(pool_size=pool_size),Flatten(),Dense(dense_units, activation='relu'),Dropout(dropout_rate),Dense(10, activation='softmax')  # digits数据集有10个类别])# 编译模型model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])# 训练模型model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)# 评估指标accuracy = accuracy_score(y_test, y_pred)report = classification_report(y_test, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载sklearn内置的手写数字数据集digits = load_digits()X, y = digits.data, digits.target# 调用CNN训练函数cnn_model, metrics = train_cnn_model(X, y,img_rows=8, img_cols=8, channels=1,filters=32, kernel_size=(3,3),dense_units=128, dropout_rate=0.3,epochs=15, batch_size=32)# 打印评估指标print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")

LeNet-5

首个成熟 CNN 结构（1998 年），含 2 个卷积层 + 2 个池化层 + 2 个全连接层。
适用场景：手写数字识别（如 MNIST 数据集）。LeNet-5 结构
C1: 6 个 5×5 卷积核
S2: 平均池化 2×2
C3: 16 个 5×5 卷积核
S4: 平均池化 2×2
C5: 120 个 1×1 卷积核（适配小图）
F6: 84 个神经元
Output: 10 个类别的 softmax

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, AveragePooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
import warnings
warnings.filterwarnings('ignore')def train_lenet5(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""LeNet-5 模型封装（适配 sklearn digits 数据集）不做额外数据预处理，仅 reshape 和 one-hot 编码"""# 重塑为 (样本数, 高, 宽, 通道)X_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 标签独热编码y_onehot = to_categorical(y, num_classes=10)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y_onehot, test_size=test_size, random_state=random_state, stratify=y_onehot)# 构建 LeNet-5 模型（适配 8×8 图像）model = Sequential([# C1: 卷积层 6个3×3卷积核Conv2D(filters=6, kernel_size=(3,3), activation='sigmoid', input_shape=(img_rows, img_cols, channels)),# S2: 平均池化 2×2AveragePooling2D(pool_size=(2,2), strides=2),# C3: 卷积层 16个3×3卷积核Conv2D(filters=16, kernel_size=(3,3), activation='sigmoid'),# 注意：因为 8×8 图像太小，去掉第二个池化层防止负尺寸# AveragePooling2D(pool_size=(2,2), strides=2),# C5: 卷积层 120个1×1卷积核Conv2D(filters=120, kernel_size=(1,1), activation='sigmoid'),# 展平Flatten(),# F6: 全连接层 84个神经元Dense(units=84, activation='sigmoid'),# Output: 10个类别Dense(units=10, activation='softmax')])# 编译模型model.compile(optimizer='sgd',loss='categorical_crossentropy',metrics=['accuracy'])# 训练model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)y_test_label = np.argmax(y_test, axis=1)# 评估accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn digits 数据集（8×8 灰度图）digits = load_digits()X, y = digits.data, digits.target# 调用 LeNet-5 训练函数lenet_model, metrics = train_lenet5(X, y,img_rows=8, img_cols=8, channels=1,epochs=20, batch_size=32)# 打印结果print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")

AlexNet

2012 年 ImageNet 竞赛冠军，首次用 ReLU 激活函数、Dropout 防过拟合、GPU 加速训练。
适用场景：ImageNet 图像分类（1000 类）AlexNet 主要特点
8 层结构：5 个卷积层 + 3 个全连接层
使用 ReLU 激活函数：解决梯度消失问题
引入 Dropout：防止过拟合
使用重叠最大池化：减少特征图尺寸
数据增强：提高模型泛化能力

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Conv2D, MaxPooling2D, Flatten, Dense, Dropout,BatchNormalization, Activation
)
from tensorflow.keras.utils import to_categorical
from PIL import Image
import warnings
warnings.filterwarnings('ignore')def build_alexnet(input_shape=(227, 227, 3), num_classes=10):"""构建 AlexNet 模型"""model = Sequential([# 1st Conv BlockConv2D(96, (11, 11), strides=(4, 4), padding='valid', input_shape=input_shape),BatchNormalization(),Activation('relu'),MaxPooling2D(pool_size=(3, 3), strides=2, padding='valid'),# 2nd Conv BlockConv2D(256, (5, 5), padding='same'),BatchNormalization(),Activation('relu'),MaxPooling2D(pool_size=(3, 3), strides=2, padding='valid'),# 3rd Conv BlockConv2D(384, (3, 3), padding='same'),BatchNormalization(),Activation('relu'),# 4th Conv BlockConv2D(384, (3, 3), padding='same'),BatchNormalization(),Activation('relu'),# 5th Conv BlockConv2D(256, (3, 3), padding='same'),BatchNormalization(),Activation('relu'),MaxPooling2D(pool_size=(3, 3), strides=2, padding='valid'),# ClassifierFlatten(),Dense(4096, activation='relu'),Dropout(0.5),Dense(4096, activation='relu'),Dropout(0.5),Dense(num_classes, activation='softmax')])return modeldef train_alexnet(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练 AlexNet 模型（适配 sklearn digits 数据集）"""# 重塑为 (样本数, 高, 宽, 通道)X_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 将 8×8 灰度图调整为 AlexNet 输入尺寸 227×227×3X_resized = np.zeros((X.shape[0], 227, 227, 3))for i in range(X.shape[0]):img = X_reshaped[i]# 灰度图转RGBimg_rgb = np.stack([img] * 3, axis=-1) if channels == 1 else img# 简单放大（最近邻插值）img_pil = Image.fromarray((img_rgb.squeeze() * 16).astype(np.uint8))  # digits像素是0-16img_resized = img_pil.resize((227, 227), Image.NEAREST)X_resized[i] = np.array(img_resized) / 255.0# 标签独热编码y_onehot = to_categorical(y, num_classes=10)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_resized, y_onehot, test_size=test_size, random_state=random_state, stratify=y_onehot)# 构建 AlexNet 模型model = build_alexnet(input_shape=(227, 227, 3), num_classes=10)# 编译模型model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])# 训练model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)y_test_label = np.argmax(y_test, axis=1)# 评估accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn digits 数据集（8×8 灰度图）digits = load_digits()X, y = digits.data, digits.target# 调用 AlexNet 训练函数alexnet_model, metrics = train_alexnet(X, y,img_rows=8, img_cols=8, channels=1,epochs=5, batch_size=32)# 打印结果print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")

ResNet（残差网络）

引入 “残差连接（Residual Connection）”，解决深层网络的 “梯度消失” 问题（允许梯度直接回传浅层）。
超深网络（如 152 层）、图像分类 / 检测 / 分割。

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Add, Activation, BatchNormalization
from tensorflow.keras.utils import to_categorical
import warnings
warnings.filterwarnings('ignore')def residual_block(x, filters, kernel_size=3, stride=1):"""残差块（Basic Block）"""y = Conv2D(filters, kernel_size=kernel_size, strides=stride, padding='same')(x)y = BatchNormalization()(y)y = Activation('relu')(y)y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)y = BatchNormalization()(y)# 若步长不为1或输入输出通道数不同，则用1x1卷积调整if stride != 1 or x.shape[-1] != filters:x = Conv2D(filters, kernel_size=1, strides=stride, padding='same')(x)y = Add()([y, x])y = Activation('relu')(y)return ydef build_resnet(input_shape=(8,8,1), num_classes=10, depth=2):"""构建一个简化版 ResNet"""inputs = Input(shape=input_shape)# 初始卷积x = Conv2D(32, kernel_size=3, strides=1, padding='same')(inputs)x = BatchNormalization()(x)x = Activation('relu')(x)# 堆叠残差块for _ in range(depth):x = residual_block(x, filters=32)# 分类部分x = MaxPooling2D(pool_size=2)(x)x = Flatten()(x)x = Dense(128, activation='relu')(x)outputs = Dense(num_classes, activation='softmax')(x)model = Model(inputs=inputs, outputs=outputs)return modeldef train_resnet(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练 ResNet 模型（适配 sklearn digits 数据集）"""# 数据 reshapeX_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 标签 one-hot 编码y_onehot = to_categorical(y, num_classes=10)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y_onehot, test_size=test_size, random_state=random_state, stratify=y_onehot)# 构建模型model = build_resnet(input_shape=(img_rows, img_cols, channels), num_classes=10)# 编译model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])# 训练model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)y_test_label = np.argmax(y_test, axis=1)# 评估accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn digits 数据集（8×8 灰度图）digits = load_digits()X, y = digits.data, digits.target# 调用 ResNet 训练函数resnet_model, metrics = train_resnet(X, y,img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32)# 打印结果print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")

Inception（GoogLeNet）

采用 “多尺度卷积核并行（1×1、3×3、5×5）”，用 1×1 卷积降维减少参数，提升效率。
适用场景：图像分类、目标检测。

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (Input, Conv2D, MaxPooling2D, AveragePooling2D,Flatten, Dense, Dropout, BatchNormalization,Activation, Concatenate
)
from tensorflow.keras.utils import to_categorical
from PIL import Image  # 提前导入，避免循环导入问题# --- Inception-v4 核心模块（适配小输入尺寸）---
def conv_bn(x, filters, kernel_size, strides=1, padding='same'):"""卷积 + BN + ReLU 组合（简化滤波器数量，适配小数据集）"""x = Conv2D(filters, kernel_size, strides=strides, padding=padding)(x)x = BatchNormalization()(x)x = Activation('relu')(x)return xdef stem_block(x):"""简化版 Stem 模块（减少步长=2的操作，避免尺寸过早缩小）"""# 32x32 输入 → 不做过多降采样，仅1次步长=2x = conv_bn(x, 32, (3, 3), strides=1, padding='same')  # 步长从2→1，保留尺寸x = conv_bn(x, 32, (3, 3), padding='same')x = conv_bn(x, 64, (3, 3))# 仅1次步长=2的降采样（避免多次缩小）branch1 = MaxPooling2D((3, 3), strides=2, padding='same')(x)branch2 = conv_bn(x, 64, (3, 3), strides=2, padding='same')  # 滤波器从96→64x = Concatenate()([branch1, branch2])  # 输出尺寸：16×16×(64+64)=16×16×128# 后续分支保持 same  padding，不进一步缩小尺寸branch1 = conv_bn(x, 64, (1, 1))branch1 = conv_bn(branch1, 64, (3, 3), padding='same')  # 从valid→same，避免尺寸丢失branch2 = conv_bn(x, 64, (1, 1))branch2 = conv_bn(branch2, 64, (1, 5))  # 从7→5，减少感受野，适配小图像branch2 = conv_bn(branch2, 64, (5, 1))branch2 = conv_bn(branch2, 64, (3, 3), padding='same')  # 从valid→samex = Concatenate()([branch1, branch2])  # 输出尺寸：16×16×(64+64)=16×16×128# 移除额外的步长=2操作，避免尺寸进一步缩小branch1 = conv_bn(x, 128, (3, 3), strides=1, padding='same')  # 步长从2→1branch2 = MaxPooling2D((3, 3), strides=1, padding='same')(x)x = Concatenate()([branch1, branch2])  # 最终输出：16×16×(128+128)=16×16×256return xdef inception_a(x):"""简化版 Inception-A 模块（降低滤波器数量）"""branch1 = conv_bn(x, 32, (1, 1))  # 从96→32branch2 = conv_bn(x, 32, (1, 1))  # 从64→32branch2 = conv_bn(branch2, 32, (3, 3))  # 从96→32branch3 = conv_bn(x, 32, (1, 1))  # 从64→32branch3 = conv_bn(branch3, 32, (3, 3))  # 从96→32branch3 = conv_bn(branch3, 32, (3, 3))  # 从96→32branch4 = AveragePooling2D((3, 3), strides=1, padding='same')(x)branch4 = conv_bn(branch4, 32, (1, 1))  # 从96→32return Concatenate()([branch1, branch2, branch3, branch4])  # 输出通道：32×4=128def inception_b(x):"""简化版 Inception-B 模块（降低滤波器数量+缩小感受野）"""branch1 = conv_bn(x, 64, (1, 1))  # 从384→64branch2 = conv_bn(x, 64, (1, 1))  # 从192→64branch2 = conv_bn(branch2, 64, (1, 5))  # 从7→5，适配小图像branch2 = conv_bn(branch2, 64, (5, 1))  # 从7→5branch3 = conv_bn(x, 64, (1, 1))  # 从192→64branch3 = conv_bn(branch3, 64, (5, 1))  # 从7→5branch3 = conv_bn(branch3, 64, (1, 5))  # 从7→5branch3 = conv_bn(branch3, 64, (5, 1))  # 从7→5branch3 = conv_bn(branch3, 64, (1, 5))  # 从7→5branch4 = AveragePooling2D((3, 3), strides=1, padding='same')(x)branch4 = conv_bn(branch4, 64, (1, 1))  # 从128→64return Concatenate()([branch1, branch2, branch3, branch4])  # 输出通道：64×4=256def inception_c(x):"""简化版 Inception-C 模块（降低滤波器数量）"""branch1 = conv_bn(x, 64, (1, 1))  # 从256→64branch2 = conv_bn(x, 64, (1, 1))  # 从384→64branch2a = conv_bn(branch2, 64, (1, 3))branch2b = conv_bn(branch2, 64, (3, 1))branch2 = Concatenate()([branch2a, branch2b])branch3 = conv_bn(x, 64, (1, 1))  # 从384→64branch3 = conv_bn(branch3, 64, (3, 1))branch3 = conv_bn(branch3, 64, (1, 3))branch3a = conv_bn(branch3, 64, (1, 3))branch3b = conv_bn(branch3, 64, (3, 1))branch3 = Concatenate()([branch3a, branch3b])branch4 = AveragePooling2D((3, 3), strides=1, padding='same')(x)branch4 = conv_bn(branch4, 64, (1, 1))  # 从256→64return Concatenate()([branch1, branch2, branch3, branch4])  # 输出通道：64+128+128+64=384def reduction_a(x):"""简化版 Reduction-A 模块（步长=2改为1，避免尺寸过小）"""branch1 = conv_bn(x, 64, (3, 3), strides=1, padding='same')  # 步长从2→1，滤波器从384→64branch2 = conv_bn(x, 64, (1, 1))  # 从192→64branch2 = conv_bn(branch2, 64, (3, 3))  # 从224→64branch2 = conv_bn(branch2, 64, (3, 3), strides=1, padding='same')  # 步长从2→1，滤波器从256→64branch3 = MaxPooling2D((3, 3), strides=1, padding='same')(x)  # 步长从2→1return Concatenate()([branch1, branch2, branch3])  # 输出尺寸保持16×16def reduction_b(x):"""简化版 Reduction-B 模块（步长=2改为1，降低滤波器数量）"""branch1 = conv_bn(x, 64, (1, 1))  # 从192→64branch1 = conv_bn(branch1, 64, (3, 3), strides=1, padding='same')  # 步长从2→1branch2 = conv_bn(x, 64, (1, 1))  # 从256→64branch2 = conv_bn(branch2, 64, (1, 5))  # 从7→5branch2 = conv_bn(branch2, 64, (5, 1))  # 从7→5branch2 = conv_bn(branch2, 64, (3, 3), strides=1, padding='same')  # 步长从2→1，滤波器从320→64branch3 = MaxPooling2D((3, 3), strides=1, padding='same')(x)  # 步长从2→1return Concatenate()([branch1, branch2, branch3])  # 输出尺寸保持16×16# --- 构建适配小输入的 Inception-v4 ---
def build_inception_v4(input_shape=(32, 32, 1), num_classes=10):"""构建简化版 Inception-v4（减少模块数量，避免尺寸过小）"""inputs = Input(shape=input_shape)# 初始模块（输出16×16×256）x = stem_block(inputs)# 减少 Inception 模块数量（从4+7+3组减为2+3+1组）for _ in range(2):  # Inception-A × 2（原4组）x = inception_a(x)x = reduction_a(x)for _ in range(3):  # Inception-B × 3（原7组）x = inception_b(x)x = reduction_b(x)for _ in range(1):  # Inception-C × 1（原3组）x = inception_c(x)# 分类头（适配小特征图，用AveragePooling2D(2,2)替代(4,4)）x = AveragePooling2D((2, 2))(x)  # 16×16 → 8×8x = Flatten()(x)x = Dropout(0.2)(x)  # 保持Dropout防止过拟合outputs = Dense(num_classes, activation='softmax')(x)return Model(inputs=inputs, outputs=outputs)# --- 训练函数 ---
def train_inception_v4(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练 Inception-v4 模型（适配 sklearn digits 数据集）"""# 1. 数据reshape（灰度图→(样本数, 高, 宽, 通道)）X_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 2. 缩放至32×32（适配模型输入）X_resized = np.zeros((X.shape[0], 32, 32, channels))for i in range(X.shape[0]):img = X_reshaped[i].squeeze()  # 去掉通道维度（8×8）img_pil = Image.fromarray((img * 16).astype(np.uint8))  # 灰度值映射到0-255img_resized = img_pil.resize((32, 32), Image.NEAREST)  # 缩放至32×32img_resized = np.array(img_resized) / 16.0  # 还原回原灰度范围（0-16）if channels == 1:img_resized = np.expand_dims(img_resized, axis=-1)  # 加通道维度（32×32×1）X_resized[i] = img_resized# 3. 标签one-hot编码（10分类）y_onehot = to_categorical(y, num_classes=10)# 4. 划分训练集/测试集（分层抽样，保证类别分布一致）X_train, X_test, y_train, y_test = train_test_split(X_resized, y_onehot, test_size=test_size,random_state=random_state, stratify=y_onehot)# 5. 构建+编译模型model = build_inception_v4(input_shape=(32, 32, channels), num_classes=10)model.compile(optimizer='adam',  # 用adam优化器，比SGD更适合小数据集loss='categorical_crossentropy',metrics=['accuracy'])# 6. 训练模型model.fit(X_train, y_train,epochs=epochs, batch_size=batch_size,verbose=1, validation_data=(X_test, y_test)  # 加验证集，实时看泛化能力)# 7. 预测+评估y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)  # 概率→类别（0-9）y_test_label = np.argmax(y_test, axis=1)  # one-hot→类别（0-9）# 计算评估指标accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# --- 调用示例 ---
if __name__ == "__main__":# 加载sklearn digits数据集（8×8灰度手写数字，10分类）digits = load_digits()X, y = digits.data, digits.target  # X: (1797, 64)，y: (1797,)# 训练模型inception_model, metrics = train_inception_v4(X, y,img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32)# 打印结果print(f"\n测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):  # 只打印类别对应的统计（排除accuracy等汇总项）print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")

生成对抗网络（Generative Adversarial Network, GAN）

生成对抗网络（Generative Adversarial Network, GAN）：由 “生成器（Generator）” 和 “判别器（Discriminator）” 构成的生成式模型：生成器生成逼真的假数据（如图像、文本），判别器区分 “真实数据” 和 “生成数据”，二者对抗训练，最终生成器能生成接近真实的数据（如 GAN 生成人脸图像）。
生成器（Generator）：GAN 的核心组件之一：接收随机噪声（ latent vector ），通过神经网络生成假数据（如用噪声生成假的手写数字图片），目标是让判别器无法区分其生成的数据与真实数据。
判别器（Discriminator）：GAN 的核心组件之一：接收 “真实数据” 或 “生成器生成的假数据”，输出 “数据为真实的概率”，目标是准确区分真假数据，与生成器形成对抗。

DCGAN（深度卷积 GAN，2015）

用 CNN 替代 FCNN 作为生成器和判别器（生成器用转置卷积上采样，判别器用卷积下采样），加入批量归一化（Batch Norm）。
适用场景：图像生成（如人脸、动漫头像）。核心改进：首次将 CNN 与 GAN 结合，明确了适合 GAN 的网络结构设计原则：
生成器用 “转置卷积 + BN（批量归一化）” 替代全连接层，避免过拟合，加速训练；
判别器用 “卷积 + BN+Leaky ReLU” 替代池化层，增强特征提取能力；
移除全连接层（仅在输出层保留），减少参数量，提升生成质量。
优势：解决了基础 GAN 训练不稳定的问题，能生成清晰的 64×64 图像（如 MNIST、CIFAR-10），成为后续图像类 GAN 的 “基准架构”。

WGAN-GP（Wasserstein GAN with Gradient Penalty，2017）

核心改进：用 “Wasserstein 距离”（推土机距离）替代基础 GAN 的 “JS 散度” 作为损失度量，解决了模式崩溃和训练不稳定问题：
判别器不再输出概率（移除 Sigmoid 激活），而是输出 “分数”（称为 Critic，评论器）；
加入 “梯度惩罚”（Gradient Penalty）约束：要求 Critic 对 “真实数据与假数据之间的插值数据” 的梯度范数≤1，确保训练稳定。
优势：训练过程几乎不会崩溃，生成的数据多样性更高，且损失值可直接反映生成质量（损失越小，生成质量越好）。

StyleGAN / StyleGAN2（2018/2019）

引入 “风格控制模块”，可分离图像的 “全局风格”（如肤色、光照）和 “局部细节”（如发型、表情），支持风格插值。
适用场景：高分辨率图像生成（如超写实人脸）、风格迁移。核心改进：引入 “风格控制” 机制，让生成器能灵活调整生成图像的 “风格”（如人脸的发型、肤色、表情），同时保持 “内容”（如人脸轮廓）不变：
生成器分为 “映射网络” 和 “合成网络”：映射网络将噪声 z 映射为 “风格向量”，合成网络根据风格向量生成图像；
支持 “细粒度风格控制”：不同层级的特征可赋予不同风格（如低层级控制纹理，高层级控制轮廓）。
优势：生成的高分辨率图像（如 1024×1024 人脸）质量极高，且可解释性强（能手动调整风格），代表作是 NVIDIA 的 “人脸生成模型”（可生成超逼真的虚拟人脸）。

CycleGAN（2017）

设计 “循环一致性损失（Cycle Consistency Loss）”，无需成对数据，实现 “无监督图像翻译”（如马转斑马、夏天转冬天）。
适用场景：无监督图像翻译、风格迁移。核心改进：提出 “循环一致性损失”（Cycle Consistency Loss），解决了 “无监督图像风格迁移” 问题（不需要成对的 “源域 - 目标域” 数据）：
包含两个生成器（G: X→Y 把 X 域图像转为 Y 域，F: Y→X 把 Y 域转为 X 域）和两个判别器（D_Y 判别 Y 域图像，D_X 判别 X 域图像）；
循环一致性损失要求：F(G(x)) ≈ x（X 域图像转 Y 域后再转回来，应接近原图像），确保迁移后的图像 “内容不变，风格变化”。
应用场景：图像风格迁移（如照片→油画、猫→狗、夏季→冬季）、图像修复、跨域图像转换（如 CT 图像→MRI 图像）。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU, Dropout
from tensorflow.keras.optimizers import Adamdef build_generator(latent_dim=100, output_shape=64):"""构建生成器（Generator）：从随机噪声生成 8×8 手写数字（展平为 64 维向量）- latent_dim：随机噪声向量的维度（默认 100 维）- output_shape：输出数据的维度（8×8=64，展平后向量）"""model = Sequential(name="Generator")# 第一层：噪声向量→256 维特征model.add(Dense(256, input_dim=latent_dim))model.add(LeakyReLU(alpha=0.2))  # LeakyReLU 避免梯度消失model.add(Dropout(0.3))  # 防止过拟合# 第二层：256 维→512 维特征model.add(Dense(512))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.3))# 输出层：512 维→64 维（对应 8×8 图像），激活函数用 tanh（输出范围 [-1,1]）model.add(Dense(output_shape, activation='tanh'))return modeldef build_discriminator(input_shape=64):"""构建判别器（Discriminator）：二分类器，区分“真实手写数字”和“生成器假数据”- input_shape：输入数据的维度（8×8=64，展平后向量）"""model = Sequential(name="Discriminator")# 第一层：64 维输入→512 维特征model.add(Dense(512, input_dim=input_shape))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.3))# 第二层：512 维→256 维特征model.add(Dense(256))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.3))# 输出层：256 维→1 维概率（0=假数据，1=真实数据），激活函数用 sigmoidmodel.add(Dense(1, activation='sigmoid'))# 编译判别器（二分类交叉熵损失）model.compile(optimizer=Adam(learning_rate=0.0002, beta_1=0.5),  # Adam 优化器，beta_1=0.5 是 GAN 常用参数loss='binary_crossentropy',metrics=['accuracy'])return modeldef build_gan(generator, discriminator):"""构建完整 GAN：固定判别器，训练生成器（让生成器欺骗判别器）- generator：已构建的生成器- discriminator：已构建的判别器"""# 训练 GAN 时，固定判别器的参数（仅更新生成器）discriminator.trainable = False# GAN 流程：噪声 → 生成器生成假数据 → 判别器判断model = Sequential(name="GAN")model.add(generator)  # 输入：随机噪声 → 输出：假数据model.add(discriminator)  # 输入：假数据 → 输出：判别概率# 编译 GAN（目标：让判别器对假数据的预测接近 1）model.compile(optimizer=Adam(learning_rate=0.0002, beta_1=0.5),loss='binary_crossentropy')return modeldef train_gan(latent_dim=100, epochs=10000, batch_size=64,sample_interval=1000, save_plots=True
):"""完整 GAN 训练流程（封装函数）- latent_dim：随机噪声维度- epochs：训练轮数- batch_size：批次大小- sample_interval：每多少轮生成一次样本并保存- save_plots：是否保存生成的手写数字图像"""# ---------------------- 1. 加载并预处理 sklearn digits 数据集 ----------------------digits = load_digits()X = digits.data  # 数据：(1797, 64)，每个样本是 8×8 展平后的向量y = digits.target  # 标签（此处用不到，GAN 是无监督学习）# 数据归一化：从 [0,16]（灰度值范围）映射到 [-1,1]（匹配生成器 tanh 输出）scaler = MinMaxScaler(feature_range=(-1, 1))X_scaled = scaler.fit_transform(X)# 划分训练集（GAN 仅用训练集，测试集无需使用）X_train, _ = train_test_split(X_scaled, test_size=0.2, random_state=42)# 真实数据标签：1（判别器认为是“真实数据”）；假数据标签：0（判别器认为是“假数据”）real_labels = np.ones((batch_size, 1))  # (batch_size, 1)，全 1fake_labels = np.zeros((batch_size, 1))  # (batch_size, 1)，全 0# ---------------------- 2. 构建生成器、判别器、GAN ----------------------generator = build_generator(latent_dim=latent_dim, output_shape=X_train.shape[1])discriminator = build_discriminator(input_shape=X_train.shape[1])gan = build_gan(generator, discriminator)# 打印网络结构print("=== 生成器结构 ===")generator.summary()print("\n=== 判别器结构 ===")discriminator.summary()print("\n=== GAN 结构 ===")gan.summary()# ---------------------- 3. 交替训练 GAN ----------------------# 存储损失值，用于后续可视化d_loss_history = []  # 判别器损失g_loss_history = []  # 生成器损失for epoch in range(epochs):# ---------------------- 3.1 训练判别器（区分真实/假数据） ----------------------# 1. 用真实数据训练判别器idx = np.random.randint(0, X_train.shape[0], batch_size)  # 随机选 batch_size 个真实样本real_imgs = X_train[idx]  # (batch_size, 64)d_loss_real, d_acc_real = discriminator.train_on_batch(real_imgs, real_labels)# 2. 用生成器的假数据训练判别器noise = np.random.normal(0, 1, (batch_size, latent_dim))  # 生成随机噪声fake_imgs = generator.predict(noise, verbose=0)  # 生成假数据d_loss_fake, d_acc_fake = discriminator.train_on_batch(fake_imgs, fake_labels)# 计算判别器总损失和准确率d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)  # 平均损失d_acc = 0.5 * np.add(d_acc_real, d_acc_fake)  # 平均准确率# ---------------------- 3.2 训练生成器（欺骗判别器） ----------------------# 生成新的随机噪声（避免与训练判别器的噪声重复）noise = np.random.normal(0, 1, (batch_size, latent_dim))# 训练生成器：目标是让判别器对假数据的预测为 1（real_labels）g_loss = gan.train_on_batch(noise, real_labels)# ---------------------- 3.3 记录损失并打印日志 ----------------------d_loss_history.append(d_loss)g_loss_history.append(g_loss)# 每 100 轮打印一次训练信息if (epoch + 1) % 100 == 0:print(f"Epoch [{epoch + 1}/{epochs}] | "f"D Loss: {d_loss:.4f} | D Acc: {d_acc:.4f} | "f"G Loss: {g_loss:.4f}")# ---------------------- 3.4 生成样本并保存（按 sample_interval 间隔） ----------------------if (epoch + 1) % sample_interval == 0:generate_samples(generator, latent_dim=latent_dim, epoch=epoch + 1,save_plots=save_plots, n_samples=25  # 生成 25 个样本（5×5 网格）)# ---------------------- 4. 训练完成：返回模型和损失历史 ----------------------return {"generator": generator,"discriminator": discriminator,"gan": gan,"d_loss_history": d_loss_history,"g_loss_history": g_loss_history}def generate_samples(generator, latent_dim, epoch, save_plots=True, n_samples=25):"""用训练好的生成器生成手写数字样本，并可视化（可选保存）- generator：训练后的生成器- latent_dim：随机噪声维度- epoch：当前训练轮数（用于文件名）- save_plots：是否保存图像- n_samples：生成样本数量（建议为平方数，如 25=5×5，36=6×6）"""# 生成随机噪声noise = np.random.normal(0, 1, (n_samples, latent_dim))# 生成假数据（8×8 展平向量）generated_imgs = generator.predict(noise, verbose=0)# 归一化到 [0,1]（方便显示图像）generated_imgs = (generated_imgs + 1) / 2  # 从 [-1,1] 映射到 [0,1]# 计算网格大小（如 25 个样本 → 5×5 网格）n_rows = int(np.sqrt(n_samples))n_cols = int(np.sqrt(n_samples))# 创建画布fig, axes = plt.subplots(n_rows, n_cols, figsize=(10, 10))axes = axes.flatten()  # 展平为一维数组，方便循环# 绘制每个生成的样本for i, ax in enumerate(axes):# 重塑为 8×8 图像（生成器输出是 64 维向量）img = generated_imgs[i].reshape(8, 8)ax.imshow(img, cmap='gray')  # 灰度显示ax.axis('off')  # 隐藏坐标轴# 标题（含当前训练轮数）fig.suptitle(f"Generated Digits (Epoch {epoch})", fontsize=16)# 保存或显示图像if save_plots:plt.savefig(f"gan_generated_digits_epoch_{epoch}.png", dpi=300, bbox_inches='tight')print(f"Generated samples saved as 'gan_generated_digits_epoch_{epoch}.png'")else:plt.show()plt.close()# ------------------- 调用示例：训练 GAN 并生成手写数字 -------------------
if __name__ == "__main__":# 调用封装的训练函数gan_results = train_gan(latent_dim=100,  # 100 维随机噪声epochs=10000,  # 训练 10000 轮（足够生成清晰数字）batch_size=64,  # 批次大小 64sample_interval=1000,  # 每 1000 轮保存一次样本save_plots=True  # 保存生成的图像)# 可选：绘制损失曲线（查看训练稳定性）plt.figure(figsize=(12, 6))plt.plot(gan_results["d_loss_history"], label="Discriminator Loss")plt.plot(gan_results["g_loss_history"], label="Generator Loss")plt.xlabel("Epochs")plt.ylabel("Loss")plt.title("GAN Training Loss History")plt.legend()plt.savefig("gan_loss_history.png", dpi=300, bbox_inches='tight')print("Loss history plot saved as 'gan_loss_history.png'")

查看全文

http://www.dtcms.com/a/450895.html