名词解释
神经网络(Neural Network, NN):模拟人脑神经元结构的模型,由输入层、隐藏层、输出层构成,每层包含多个神经元,神经元通过权重连接,通过激活函数引入非线性,实现复杂映射。
人工神经元(Artificial Neuron):神经网络的基本单元:接收输入信号(特征或前层输出),加权求和后加偏置,再通过激活函数输出结果(如 y = σ(w1x1 + w2x2 + b),σ 为激活函数)。
输入层(Input Layer):神经网络的第一层,直接接收原始特征数据,神经元数量等于特征维度(如用 “年龄、收入、消费频次”3 个特征,输入层则有 3 个神经元)。
隐藏层(Hidden Layer):输入层与输出层之间的层,用于提取数据的抽象特征,层数越多(即 “深度” 越大),模型越能学习复杂规律(如 CNN 的卷积层、Transformer 的注意力层均为隐藏层)。
输出层(Output Layer):神经网络的最后一层,输出模型预测结果:分类任务用 softmax 激活(输出类别概率),回归任务用线性激活(输出连续值),二分类任务用 sigmoid 激活(输出正类概率)。
激活函数(Activation Function):为神经网络引入非线性的函数,使模型能拟合非线性数据,常用激活函数:sigmoid(二分类输出)、tanh(隐藏层)、ReLU(隐藏层,缓解梯度消失)、softmax(多分类输出)。
ReLU(Rectified Linear Unit):最常用的激活函数,公式为 f (x) = max (0, x),优点是计算快、缓解梯度消失(x>0 时梯度为 1),缺点是部分神经元可能 “死亡”(x≤0 时梯度为 0,参数不再更新)。
Softmax 函数:多分类任务的输出层激活函数,将模型输出的 “logits”(未归一化的分数)映射为 [0,1] 区间的概率,且所有类别概率和为 1(如输出 [0.1, 0.8, 0.1] 表示样本属于第 2 类的概率为 80%)。
反向传播(Backpropagation, BP):神经网络训练的核心算法:通过 “链式法则” 从输出层反向计算各层参数的梯度,再用梯度下降更新参数,实现损失函数的最小化。
梯度消失(Vanishing Gradient):深层神经网络训练中的问题:梯度通过反向传播时,经过多层权重相乘后逐渐趋近于 0,导致浅层参数无法更新(如用 sigmoid 激活函数时,梯度易消失),ReLU 可缓解此问题。
梯度爆炸(Exploding Gradient):深层神经网络训练中的问题:梯度通过反向传播时,经过多层权重相乘后急剧增大,导致参数更新幅度过大,模型不稳定(可通过梯度裁剪缓解)。Transformer :模型基于 “自注意力机制” 的序列模型,无需循环结构即可捕捉序列中任意位置的依赖关系,并行计算效率高,是 NLP 领域的革命性模型(如 BERT、GPT 均基于 Transformer)。
自注意力机制(Self-Attention Mechanism):Transformer 的核心机制:通过计算 “查询(Q)、键(K)、值(V)” 的相似度,为序列中每个元素分配不同的注意力权重,突出重要元素的影响(如处理句子 “猫追老鼠” 时,“追” 会更关注 “猫” 和 “老鼠”)。
多头注意力(Multi-Head Attention):自注意力的扩展:将 Q、K、V 投影到多个子空间,并行计算多个注意力头,再拼接结果,能捕捉序列中不同类型的依赖关系(如语义依赖、语法依赖)。
编码器 - 解码器(Encoder-Decoder):结构处理 “输入序列→输出序列” 任务的模型框架:编码器将输入序列编码为 “上下文向量”,解码器基于上下文向量生成输出序列,典型应用如机器翻译(输入英文→编码器→解码器→输出中文)。迁移学习(Transfer Learning):将从 “源任务”(如 ImageNet 图像分类)中学到的知识迁移到 “目标任务”(如猫咪品种分类)的方法,减少目标任务的标注数据需求,加速模型训练(如用预训练的 ResNet 微调猫咪分类模型)。
预训练模型(Pre-trained Model):在大规模数据上提前训练好的模型,可作为 “基础模型” 在下游任务中微调使用(如 BERT、GPT、ResNet 均为预训练模型),避免从零开始训练,提升效率和性能。
微调(Fine-Tuning):迁移学习的常用方法:将预训练模型的参数作为初始值,用目标任务的少量数据继续训练,调整部分或全部参数以适配目标任务(如用 BERT 微调情感分析模型时,仅训练最后一层全连接层)。变分自编码器(Variational Autoencoder, VAE):基于自编码器的生成式模型:通过 “编码器将数据映射到 latent 空间的概率分布”,再用解码器从分布中采样并重建数据,能生成新的、多样化的数据(如 VAE 生成动漫头像)。
自编码器(Autoencoder, AE):无监督学习模型:由编码器(将输入压缩为低维 latent 向量)和解码器(将 latent 向量重建为原始输入)构成,核心是最小化 “重建误差”,常用于降维、特征提取、异常检测。神经网络(尤其是 LSTM、RNN):用 MinMaxScaler,对特征和目标都归一化
线性模型 / PCA / 聚类:用 StandardScaler,只对特征标准化,不要对目标标准化(回归问题)
永远不要在测试集上 fit —— 会造成数据泄露!
全连接神经网络(Fully Connected Neural Network, FCNN)
又称 “多层感知机(Multi-Layer Perceptron, MLP)”,由输入层、隐藏层、输出层构成,层间神经元 “全连接”(每个神经元与下一层所有神经元相连);
每个连接有 “权重(Weight)”,神经元输出通过 “激活函数(如 Sigmoid、ReLU)” 引入非线性,使模型能拟合复杂非线性关系;
训练通过 “反向传播(Backpropagation)” 计算损失函数对权重的梯度,用梯度下降更新参数,最小化预测误差。
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_reportdef train_mlp_classifier(X, y, hidden_layer_sizes=(100,), max_iter=300, random_state=42):"""训练一个全连接神经网络(多层感知机)分类器参数:- X: 特征数据- y: 标签数据- hidden_layer_sizes: 隐藏层结构,默认(100,)表示1层100个神经元- max_iter: 最大迭代次数- random_state: 随机种子返回:- model: 训练好的模型- accuracy: 测试集准确率- report: 分类报告"""# 1. 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)# 2. 特征标准化scaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test)# 3. 定义并训练 MLP 模型model = MLPClassifier(hidden_layer_sizes=hidden_layer_sizes,max_iter=max_iter,random_state=random_state,verbose=False)model.fit(X_train_scaled, y_train)# 4. 预测与评估y_pred = model.predict(X_test_scaled)accuracy = accuracy_score(y_test, y_pred)report = classification_report(y_test, y_pred)return model, accuracy, report# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn 内置手写数字数据集digits = load_digits()X, y = digits.data, digits.target# 调用函数训练模型mlp_model, acc, report = train_mlp_classifier(X, y,hidden_layer_sizes=(128, 64), # 两层隐藏层:128 -> 64max_iter=500,random_state=42)# 输出结果print(f"测试集准确率: {acc:.4f}")print("分类报告:")print(report)
循环神经网络(Recurrent Neural Network, RNN)
循环神经网络(Recurrent Neural Network, RNN):处理序列数据(如文本、时间序列)的神经网络,通过 “隐藏状态” 保存历史信息,实现对序列上下文的建模(如用 RNN 处理句子时,每个词的预测会依赖前几个词),但存在长期依赖问题。
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
import mathdef create_sequences(data, target, time_steps=3):"""将数据转换为时序输入格式 [样本数, 时间步, 特征数]"""X, y = [], []for i in range(len(data) - time_steps):X.append(data[i:i + time_steps, :]) # 前 time_steps 个时间步的特征y.append(target[i + time_steps]) # 下一个时间步的目标return np.array(X), np.array(y)def train_rnn_model(X, y, time_steps=3, units=50, epochs=50, test_size=0.2, random_state=42):"""训练循环神经网络(SimpleRNN)模型参数:- X: 特征数据 (样本数, 特征数)- y: 目标数据 (样本数,)- time_steps: 时间步长- units: RNN 隐藏层单元数- epochs: 训练轮数- test_size: 测试集比例- random_state: 随机种子返回:- model: 训练好的 RNN 模型- metrics: 包含多个评估指标的字典"""# 数据归一化scaler_X = MinMaxScaler()scaler_y = MinMaxScaler()X_scaled = scaler_X.fit_transform(X)y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).flatten()# 创建时序数据X_seq, y_seq = create_sequences(X_scaled, y_scaled, time_steps)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=test_size, random_state=random_state, shuffle=False)# 构建 RNN 模型model = Sequential()model.add(SimpleRNN(units, input_shape=(time_steps, X.shape[1])))model.add(Dense(1))model.compile(optimizer='adam', loss='mse')# 训练模型model.fit(X_train, y_train, epochs=epochs, batch_size=32, verbose=0)# 预测y_pred = model.predict(X_test)# 反归一化y_test_inv = scaler_y.inverse_transform(y_test.reshape(-1, 1)).flatten()y_pred_inv = scaler_y.inverse_transform(y_pred).flatten()# 计算多个评估指标mse = mean_squared_error(y_test_inv, y_pred_inv)rmse = math.sqrt(mse)mae = mean_absolute_error(y_test_inv, y_pred_inv)r2 = r2_score(y_test_inv, y_pred_inv)metrics = {'MSE': mse,'RMSE': rmse,'MAE': mae,'R2': r2}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载糖尿病数据集diabetes = load_diabetes()X, y = diabetes.data, diabetes.target# 训练 RNN 模型rnn_model, metrics = train_rnn_model(X, y,time_steps=5,units=64,epochs=100)# 打印所有评估指标print("模型评估指标:")for name, value in metrics.items():print(f"{name}: {value:.4f}")
长短期记忆网络(Long Short-Term Memory, LSTM)
长短期记忆网络(Long Short-Term Memory, LSTM):解决 RNN 长期依赖问题的改进模型:通过 “门控机制”(输入门、遗忘门、输出门)控制信息的输入、遗忘和输出,能长期保存有用信息(如用 LSTM 处理长文本生成任务)。
门控循环单元(Gated Recurrent Unit, GRU):LSTM 的简化版本,合并了遗忘门和输入门为 “更新门”,减少参数数量,训练速度更快,在许多序列任务中性能接近 LSTM(如语音识别、文本分类)。
import numpy as np
import math
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Densedef create_sequences(data, target, time_steps=3):"""将数据转换为时序输入格式 [样本数, 时间步, 特征数]"""X, y = [], []for i in range(len(data) - time_steps):X.append(data[i:i+time_steps, :]) # 前 time_steps 个时间步的特征y.append(target[i+time_steps]) # 下一个时间步的目标return np.array(X), np.array(y)def train_lstm_model(X, y, time_steps=3, units=50, epochs=150, test_size=0.2, random_state=42):"""训练 LSTM 模型(回归任务)参数:- X: 特征数据 (样本数, 特征数)- y: 目标数据 (样本数,)- time_steps: 时间步长- units: LSTM 隐藏层单元数- epochs: 训练轮数- test_size: 测试集比例- random_state: 随机种子返回:- model: 训练好的 LSTM 模型- metrics: 评估指标字典 (MSE, RMSE, MAE, R2)"""# 数据归一化scaler_X = MinMaxScaler()scaler_y = MinMaxScaler()X_scaled = scaler_X.fit_transform(X)y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).flatten()# 创建时序数据X_seq, y_seq = create_sequences(X_scaled, y_scaled, time_steps)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=test_size, random_state=random_state, shuffle=False)# 构建 LSTM 模型model = Sequential()model.add(LSTM(units, input_shape=(time_steps, X.shape[1])))model.add(Dense(1))model.compile(optimizer='adam', loss='mse')# 训练模型model.fit(X_train, y_train, epochs=epochs, batch_size=32, verbose=0)# 预测y_pred = model.predict(X_test)# 反归一化y_test_inv = scaler_y.inverse_transform(y_test.reshape(-1, 1)).flatten()y_pred_inv = scaler_y.inverse_transform(y_pred).flatten()# 计算评估指标mse = mean_squared_error(y_test_inv, y_pred_inv)rmse = math.sqrt(mse)mae = mean_absolute_error(y_test_inv, y_pred_inv)r2 = r2_score(y_test_inv, y_pred_inv)metrics = {'MSE': mse,'RMSE': rmse,'MAE': mae,'R2': r2}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn 内置的糖尿病数据集diabetes = load_diabetes()X, y = diabetes.data, diabetes.target# 调用 LSTM 训练函数lstm_model, metrics = train_lstm_model(X, y,time_steps=5,units=64,epochs=100)# 打印评估指标print("模型评估指标:")for name, value in metrics.items():print(f"{name}: {value:.4f}")
卷积神经网络(Convolutional Neural Network, CNN)
卷积层(Convolutional Layer):CNN 的核心层:用 “卷积核”(如 3×3 的矩阵)在输入特征图上滑动,通过元素相乘求和提取局部特征(如用 “边缘检测卷积核” 提取图像中的边缘),多个卷积核可提取不同特征。
池化层(Pooling Layer):CNN 中的降维层:通过 “窗口滑动” 对局部区域进行聚合操作(如最大池化取窗口内最大值,平均池化取平均值),减少参数数量和计算量,增强模型对微小位移的鲁棒性。
全连接层(Fully Connected Layer):神经网络中 “每层神经元与前层所有神经元连接” 的层,将前层提取的特征映射为最终预测结果(如 CNN 的最后几层常用全连接层,输出类别概率)。
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropoutdef train_cnn_model(X, y, img_rows=8, img_cols=8, channels=1,filters=32, kernel_size=(3,3), pool_size=(2,2),dense_units=128, dropout_rate=0.5,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练卷积神经网络(CNN)模型用于图像分类参数:- X: 特征数据 (样本数, 特征数)- y: 标签数据 (样本数,)- img_rows, img_cols: 图像尺寸- channels: 图像通道数(灰度图=1,彩色图=3)- filters: 卷积核数量- kernel_size: 卷积核大小- pool_size: 池化窗口大小- dense_units: 全连接层单元数- dropout_rate: Dropout比例- epochs: 训练轮数- batch_size: 批次大小- test_size: 测试集比例- random_state: 随机种子返回:- model: 训练好的CNN模型- metrics: 评估指标字典"""# 数据归一化scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X)# 重塑为CNN输入格式 [样本数, 行数, 列数, 通道数]X_reshaped = X_scaled.reshape(X.shape[0], img_rows, img_cols, channels)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y, test_size=test_size, random_state=random_state, stratify=y)# 构建CNN模型model = Sequential([Conv2D(filters, kernel_size, activation='relu', input_shape=(img_rows, img_cols, channels)),MaxPooling2D(pool_size=pool_size),Flatten(),Dense(dense_units, activation='relu'),Dropout(dropout_rate),Dense(10, activation='softmax') # digits数据集有10个类别])# 编译模型model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])# 训练模型model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)# 评估指标accuracy = accuracy_score(y_test, y_pred)report = classification_report(y_test, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载sklearn内置的手写数字数据集digits = load_digits()X, y = digits.data, digits.target# 调用CNN训练函数cnn_model, metrics = train_cnn_model(X, y,img_rows=8, img_cols=8, channels=1,filters=32, kernel_size=(3,3),dense_units=128, dropout_rate=0.3,epochs=15, batch_size=32)# 打印评估指标print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")
LeNet-5
首个成熟 CNN 结构(1998 年),含 2 个卷积层 + 2 个池化层 + 2 个全连接层。
适用场景:手写数字识别(如 MNIST 数据集)。LeNet-5 结构
C1: 6 个 5×5 卷积核
S2: 平均池化 2×2
C3: 16 个 5×5 卷积核
S4: 平均池化 2×2
C5: 120 个 1×1 卷积核(适配小图)
F6: 84 个神经元
Output: 10 个类别的 softmax
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, AveragePooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
import warnings
warnings.filterwarnings('ignore')def train_lenet5(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""LeNet-5 模型封装(适配 sklearn digits 数据集)不做额外数据预处理,仅 reshape 和 one-hot 编码"""# 重塑为 (样本数, 高, 宽, 通道)X_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 标签独热编码y_onehot = to_categorical(y, num_classes=10)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y_onehot, test_size=test_size, random_state=random_state, stratify=y_onehot)# 构建 LeNet-5 模型(适配 8×8 图像)model = Sequential([# C1: 卷积层 6个3×3卷积核Conv2D(filters=6, kernel_size=(3,3), activation='sigmoid', input_shape=(img_rows, img_cols, channels)),# S2: 平均池化 2×2AveragePooling2D(pool_size=(2,2), strides=2),# C3: 卷积层 16个3×3卷积核Conv2D(filters=16, kernel_size=(3,3), activation='sigmoid'),# 注意:因为 8×8 图像太小,去掉第二个池化层防止负尺寸# AveragePooling2D(pool_size=(2,2), strides=2),# C5: 卷积层 120个1×1卷积核Conv2D(filters=120, kernel_size=(1,1), activation='sigmoid'),# 展平Flatten(),# F6: 全连接层 84个神经元Dense(units=84, activation='sigmoid'),# Output: 10个类别Dense(units=10, activation='softmax')])# 编译模型model.compile(optimizer='sgd',loss='categorical_crossentropy',metrics=['accuracy'])# 训练model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)y_test_label = np.argmax(y_test, axis=1)# 评估accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn digits 数据集(8×8 灰度图)digits = load_digits()X, y = digits.data, digits.target# 调用 LeNet-5 训练函数lenet_model, metrics = train_lenet5(X, y,img_rows=8, img_cols=8, channels=1,epochs=20, batch_size=32)# 打印结果print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")
AlexNet
2012 年 ImageNet 竞赛冠军,首次用 ReLU 激活函数、Dropout 防过拟合、GPU 加速训练。
适用场景:ImageNet 图像分类(1000 类)AlexNet 主要特点
8 层结构:5 个卷积层 + 3 个全连接层
使用 ReLU 激活函数:解决梯度消失问题
引入 Dropout:防止过拟合
使用重叠最大池化:减少特征图尺寸
数据增强:提高模型泛化能力
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Conv2D, MaxPooling2D, Flatten, Dense, Dropout,BatchNormalization, Activation
)
from tensorflow.keras.utils import to_categorical
from PIL import Image
import warnings
warnings.filterwarnings('ignore')def build_alexnet(input_shape=(227, 227, 3), num_classes=10):"""构建 AlexNet 模型"""model = Sequential([# 1st Conv BlockConv2D(96, (11, 11), strides=(4, 4), padding='valid', input_shape=input_shape),BatchNormalization(),Activation('relu'),MaxPooling2D(pool_size=(3, 3), strides=2, padding='valid'),# 2nd Conv BlockConv2D(256, (5, 5), padding='same'),BatchNormalization(),Activation('relu'),MaxPooling2D(pool_size=(3, 3), strides=2, padding='valid'),# 3rd Conv BlockConv2D(384, (3, 3), padding='same'),BatchNormalization(),Activation('relu'),# 4th Conv BlockConv2D(384, (3, 3), padding='same'),BatchNormalization(),Activation('relu'),# 5th Conv BlockConv2D(256, (3, 3), padding='same'),BatchNormalization(),Activation('relu'),MaxPooling2D(pool_size=(3, 3), strides=2, padding='valid'),# ClassifierFlatten(),Dense(4096, activation='relu'),Dropout(0.5),Dense(4096, activation='relu'),Dropout(0.5),Dense(num_classes, activation='softmax')])return modeldef train_alexnet(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练 AlexNet 模型(适配 sklearn digits 数据集)"""# 重塑为 (样本数, 高, 宽, 通道)X_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 将 8×8 灰度图调整为 AlexNet 输入尺寸 227×227×3X_resized = np.zeros((X.shape[0], 227, 227, 3))for i in range(X.shape[0]):img = X_reshaped[i]# 灰度图转RGBimg_rgb = np.stack([img] * 3, axis=-1) if channels == 1 else img# 简单放大(最近邻插值)img_pil = Image.fromarray((img_rgb.squeeze() * 16).astype(np.uint8)) # digits像素是0-16img_resized = img_pil.resize((227, 227), Image.NEAREST)X_resized[i] = np.array(img_resized) / 255.0# 标签独热编码y_onehot = to_categorical(y, num_classes=10)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_resized, y_onehot, test_size=test_size, random_state=random_state, stratify=y_onehot)# 构建 AlexNet 模型model = build_alexnet(input_shape=(227, 227, 3), num_classes=10)# 编译模型model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])# 训练model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)y_test_label = np.argmax(y_test, axis=1)# 评估accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn digits 数据集(8×8 灰度图)digits = load_digits()X, y = digits.data, digits.target# 调用 AlexNet 训练函数alexnet_model, metrics = train_alexnet(X, y,img_rows=8, img_cols=8, channels=1,epochs=5, batch_size=32)# 打印结果print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")
ResNet(残差网络)
引入 “残差连接(Residual Connection)”,解决深层网络的 “梯度消失” 问题(允许梯度直接回传浅层)。
超深网络(如 152 层)、图像分类 / 检测 / 分割。
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Add, Activation, BatchNormalization
from tensorflow.keras.utils import to_categorical
import warnings
warnings.filterwarnings('ignore')def residual_block(x, filters, kernel_size=3, stride=1):"""残差块(Basic Block)"""y = Conv2D(filters, kernel_size=kernel_size, strides=stride, padding='same')(x)y = BatchNormalization()(y)y = Activation('relu')(y)y = Conv2D(filters, kernel_size=kernel_size, strides=1, padding='same')(y)y = BatchNormalization()(y)# 若步长不为1或输入输出通道数不同,则用1x1卷积调整if stride != 1 or x.shape[-1] != filters:x = Conv2D(filters, kernel_size=1, strides=stride, padding='same')(x)y = Add()([y, x])y = Activation('relu')(y)return ydef build_resnet(input_shape=(8,8,1), num_classes=10, depth=2):"""构建一个简化版 ResNet"""inputs = Input(shape=input_shape)# 初始卷积x = Conv2D(32, kernel_size=3, strides=1, padding='same')(inputs)x = BatchNormalization()(x)x = Activation('relu')(x)# 堆叠残差块for _ in range(depth):x = residual_block(x, filters=32)# 分类部分x = MaxPooling2D(pool_size=2)(x)x = Flatten()(x)x = Dense(128, activation='relu')(x)outputs = Dense(num_classes, activation='softmax')(x)model = Model(inputs=inputs, outputs=outputs)return modeldef train_resnet(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练 ResNet 模型(适配 sklearn digits 数据集)"""# 数据 reshapeX_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 标签 one-hot 编码y_onehot = to_categorical(y, num_classes=10)# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y_onehot, test_size=test_size, random_state=random_state, stratify=y_onehot)# 构建模型model = build_resnet(input_shape=(img_rows, img_cols, channels), num_classes=10)# 编译model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])# 训练model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=1)# 预测y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1)y_test_label = np.argmax(y_test, axis=1)# 评估accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# ------------------- 调用示例 -------------------
if __name__ == "__main__":# 加载 sklearn digits 数据集(8×8 灰度图)digits = load_digits()X, y = digits.data, digits.target# 调用 ResNet 训练函数resnet_model, metrics = train_resnet(X, y,img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32)# 打印结果print(f"测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict):print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")
Inception(GoogLeNet)
采用 “多尺度卷积核并行(1×1、3×3、5×5)”,用 1×1 卷积降维减少参数,提升效率。
适用场景:图像分类、目标检测。
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (Input, Conv2D, MaxPooling2D, AveragePooling2D,Flatten, Dense, Dropout, BatchNormalization,Activation, Concatenate
)
from tensorflow.keras.utils import to_categorical
from PIL import Image # 提前导入,避免循环导入问题# --- Inception-v4 核心模块(适配小输入尺寸)---
def conv_bn(x, filters, kernel_size, strides=1, padding='same'):"""卷积 + BN + ReLU 组合(简化滤波器数量,适配小数据集)"""x = Conv2D(filters, kernel_size, strides=strides, padding=padding)(x)x = BatchNormalization()(x)x = Activation('relu')(x)return xdef stem_block(x):"""简化版 Stem 模块(减少步长=2的操作,避免尺寸过早缩小)"""# 32x32 输入 → 不做过多降采样,仅1次步长=2x = conv_bn(x, 32, (3, 3), strides=1, padding='same') # 步长从2→1,保留尺寸x = conv_bn(x, 32, (3, 3), padding='same')x = conv_bn(x, 64, (3, 3))# 仅1次步长=2的降采样(避免多次缩小)branch1 = MaxPooling2D((3, 3), strides=2, padding='same')(x)branch2 = conv_bn(x, 64, (3, 3), strides=2, padding='same') # 滤波器从96→64x = Concatenate()([branch1, branch2]) # 输出尺寸:16×16×(64+64)=16×16×128# 后续分支保持 same padding,不进一步缩小尺寸branch1 = conv_bn(x, 64, (1, 1))branch1 = conv_bn(branch1, 64, (3, 3), padding='same') # 从valid→same,避免尺寸丢失branch2 = conv_bn(x, 64, (1, 1))branch2 = conv_bn(branch2, 64, (1, 5)) # 从7→5,减少感受野,适配小图像branch2 = conv_bn(branch2, 64, (5, 1))branch2 = conv_bn(branch2, 64, (3, 3), padding='same') # 从valid→samex = Concatenate()([branch1, branch2]) # 输出尺寸:16×16×(64+64)=16×16×128# 移除额外的步长=2操作,避免尺寸进一步缩小branch1 = conv_bn(x, 128, (3, 3), strides=1, padding='same') # 步长从2→1branch2 = MaxPooling2D((3, 3), strides=1, padding='same')(x)x = Concatenate()([branch1, branch2]) # 最终输出:16×16×(128+128)=16×16×256return xdef inception_a(x):"""简化版 Inception-A 模块(降低滤波器数量)"""branch1 = conv_bn(x, 32, (1, 1)) # 从96→32branch2 = conv_bn(x, 32, (1, 1)) # 从64→32branch2 = conv_bn(branch2, 32, (3, 3)) # 从96→32branch3 = conv_bn(x, 32, (1, 1)) # 从64→32branch3 = conv_bn(branch3, 32, (3, 3)) # 从96→32branch3 = conv_bn(branch3, 32, (3, 3)) # 从96→32branch4 = AveragePooling2D((3, 3), strides=1, padding='same')(x)branch4 = conv_bn(branch4, 32, (1, 1)) # 从96→32return Concatenate()([branch1, branch2, branch3, branch4]) # 输出通道:32×4=128def inception_b(x):"""简化版 Inception-B 模块(降低滤波器数量+缩小感受野)"""branch1 = conv_bn(x, 64, (1, 1)) # 从384→64branch2 = conv_bn(x, 64, (1, 1)) # 从192→64branch2 = conv_bn(branch2, 64, (1, 5)) # 从7→5,适配小图像branch2 = conv_bn(branch2, 64, (5, 1)) # 从7→5branch3 = conv_bn(x, 64, (1, 1)) # 从192→64branch3 = conv_bn(branch3, 64, (5, 1)) # 从7→5branch3 = conv_bn(branch3, 64, (1, 5)) # 从7→5branch3 = conv_bn(branch3, 64, (5, 1)) # 从7→5branch3 = conv_bn(branch3, 64, (1, 5)) # 从7→5branch4 = AveragePooling2D((3, 3), strides=1, padding='same')(x)branch4 = conv_bn(branch4, 64, (1, 1)) # 从128→64return Concatenate()([branch1, branch2, branch3, branch4]) # 输出通道:64×4=256def inception_c(x):"""简化版 Inception-C 模块(降低滤波器数量)"""branch1 = conv_bn(x, 64, (1, 1)) # 从256→64branch2 = conv_bn(x, 64, (1, 1)) # 从384→64branch2a = conv_bn(branch2, 64, (1, 3))branch2b = conv_bn(branch2, 64, (3, 1))branch2 = Concatenate()([branch2a, branch2b])branch3 = conv_bn(x, 64, (1, 1)) # 从384→64branch3 = conv_bn(branch3, 64, (3, 1))branch3 = conv_bn(branch3, 64, (1, 3))branch3a = conv_bn(branch3, 64, (1, 3))branch3b = conv_bn(branch3, 64, (3, 1))branch3 = Concatenate()([branch3a, branch3b])branch4 = AveragePooling2D((3, 3), strides=1, padding='same')(x)branch4 = conv_bn(branch4, 64, (1, 1)) # 从256→64return Concatenate()([branch1, branch2, branch3, branch4]) # 输出通道:64+128+128+64=384def reduction_a(x):"""简化版 Reduction-A 模块(步长=2改为1,避免尺寸过小)"""branch1 = conv_bn(x, 64, (3, 3), strides=1, padding='same') # 步长从2→1,滤波器从384→64branch2 = conv_bn(x, 64, (1, 1)) # 从192→64branch2 = conv_bn(branch2, 64, (3, 3)) # 从224→64branch2 = conv_bn(branch2, 64, (3, 3), strides=1, padding='same') # 步长从2→1,滤波器从256→64branch3 = MaxPooling2D((3, 3), strides=1, padding='same')(x) # 步长从2→1return Concatenate()([branch1, branch2, branch3]) # 输出尺寸保持16×16def reduction_b(x):"""简化版 Reduction-B 模块(步长=2改为1,降低滤波器数量)"""branch1 = conv_bn(x, 64, (1, 1)) # 从192→64branch1 = conv_bn(branch1, 64, (3, 3), strides=1, padding='same') # 步长从2→1branch2 = conv_bn(x, 64, (1, 1)) # 从256→64branch2 = conv_bn(branch2, 64, (1, 5)) # 从7→5branch2 = conv_bn(branch2, 64, (5, 1)) # 从7→5branch2 = conv_bn(branch2, 64, (3, 3), strides=1, padding='same') # 步长从2→1,滤波器从320→64branch3 = MaxPooling2D((3, 3), strides=1, padding='same')(x) # 步长从2→1return Concatenate()([branch1, branch2, branch3]) # 输出尺寸保持16×16# --- 构建适配小输入的 Inception-v4 ---
def build_inception_v4(input_shape=(32, 32, 1), num_classes=10):"""构建简化版 Inception-v4(减少模块数量,避免尺寸过小)"""inputs = Input(shape=input_shape)# 初始模块(输出16×16×256)x = stem_block(inputs)# 减少 Inception 模块数量(从4+7+3组减为2+3+1组)for _ in range(2): # Inception-A × 2(原4组)x = inception_a(x)x = reduction_a(x)for _ in range(3): # Inception-B × 3(原7组)x = inception_b(x)x = reduction_b(x)for _ in range(1): # Inception-C × 1(原3组)x = inception_c(x)# 分类头(适配小特征图,用AveragePooling2D(2,2)替代(4,4))x = AveragePooling2D((2, 2))(x) # 16×16 → 8×8x = Flatten()(x)x = Dropout(0.2)(x) # 保持Dropout防止过拟合outputs = Dense(num_classes, activation='softmax')(x)return Model(inputs=inputs, outputs=outputs)# --- 训练函数 ---
def train_inception_v4(X, y, img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32, test_size=0.2, random_state=42):"""训练 Inception-v4 模型(适配 sklearn digits 数据集)"""# 1. 数据reshape(灰度图→(样本数, 高, 宽, 通道))X_reshaped = X.reshape(X.shape[0], img_rows, img_cols, channels)# 2. 缩放至32×32(适配模型输入)X_resized = np.zeros((X.shape[0], 32, 32, channels))for i in range(X.shape[0]):img = X_reshaped[i].squeeze() # 去掉通道维度(8×8)img_pil = Image.fromarray((img * 16).astype(np.uint8)) # 灰度值映射到0-255img_resized = img_pil.resize((32, 32), Image.NEAREST) # 缩放至32×32img_resized = np.array(img_resized) / 16.0 # 还原回原灰度范围(0-16)if channels == 1:img_resized = np.expand_dims(img_resized, axis=-1) # 加通道维度(32×32×1)X_resized[i] = img_resized# 3. 标签one-hot编码(10分类)y_onehot = to_categorical(y, num_classes=10)# 4. 划分训练集/测试集(分层抽样,保证类别分布一致)X_train, X_test, y_train, y_test = train_test_split(X_resized, y_onehot, test_size=test_size,random_state=random_state, stratify=y_onehot)# 5. 构建+编译模型model = build_inception_v4(input_shape=(32, 32, channels), num_classes=10)model.compile(optimizer='adam', # 用adam优化器,比SGD更适合小数据集loss='categorical_crossentropy',metrics=['accuracy'])# 6. 训练模型model.fit(X_train, y_train,epochs=epochs, batch_size=batch_size,verbose=1, validation_data=(X_test, y_test) # 加验证集,实时看泛化能力)# 7. 预测+评估y_pred_prob = model.predict(X_test)y_pred = np.argmax(y_pred_prob, axis=1) # 概率→类别(0-9)y_test_label = np.argmax(y_test, axis=1) # one-hot→类别(0-9)# 计算评估指标accuracy = accuracy_score(y_test_label, y_pred)report = classification_report(y_test_label, y_pred, output_dict=True)metrics = {'accuracy': accuracy,'classification_report': report}return model, metrics# --- 调用示例 ---
if __name__ == "__main__":# 加载sklearn digits数据集(8×8灰度手写数字,10分类)digits = load_digits()X, y = digits.data, digits.target # X: (1797, 64),y: (1797,)# 训练模型inception_model, metrics = train_inception_v4(X, y,img_rows=8, img_cols=8, channels=1,epochs=10, batch_size=32)# 打印结果print(f"\n测试集准确率: {metrics['accuracy']:.4f}")print("\n分类报告:")for cls, stats in metrics['classification_report'].items():if isinstance(stats, dict): # 只打印类别对应的统计(排除accuracy等汇总项)print(f"类别 {cls}: 精确率={stats['precision']:.4f}, 召回率={stats['recall']:.4f}, F1值={stats['f1-score']:.4f}, 样本数={stats['support']}")
生成对抗网络(Generative Adversarial Network, GAN)
生成对抗网络(Generative Adversarial Network, GAN):由 “生成器(Generator)” 和 “判别器(Discriminator)” 构成的生成式模型:生成器生成逼真的假数据(如图像、文本),判别器区分 “真实数据” 和 “生成数据”,二者对抗训练,最终生成器能生成接近真实的数据(如 GAN 生成人脸图像)。
生成器(Generator):GAN 的核心组件之一:接收随机噪声( latent vector ),通过神经网络生成假数据(如用噪声生成假的手写数字图片),目标是让判别器无法区分其生成的数据与真实数据。
判别器(Discriminator):GAN 的核心组件之一:接收 “真实数据” 或 “生成器生成的假数据”,输出 “数据为真实的概率”,目标是准确区分真假数据,与生成器形成对抗。
DCGAN(深度卷积 GAN,2015)
用 CNN 替代 FCNN 作为生成器和判别器(生成器用转置卷积上采样,判别器用卷积下采样),加入批量归一化(Batch Norm)。
适用场景:图像生成(如人脸、动漫头像)。核心改进:首次将 CNN 与 GAN 结合,明确了适合 GAN 的网络结构设计原则:
生成器用 “转置卷积 + BN(批量归一化)” 替代全连接层,避免过拟合,加速训练;
判别器用 “卷积 + BN+Leaky ReLU” 替代池化层,增强特征提取能力;
移除全连接层(仅在输出层保留),减少参数量,提升生成质量。
优势:解决了基础 GAN 训练不稳定的问题,能生成清晰的 64×64 图像(如 MNIST、CIFAR-10),成为后续图像类 GAN 的 “基准架构”。
WGAN-GP(Wasserstein GAN with Gradient Penalty,2017)
核心改进:用 “Wasserstein 距离”(推土机距离)替代基础 GAN 的 “JS 散度” 作为损失度量,解决了模式崩溃和训练不稳定问题:
判别器不再输出概率(移除 Sigmoid 激活),而是输出 “分数”(称为 Critic,评论器);
加入 “梯度惩罚”(Gradient Penalty)约束:要求 Critic 对 “真实数据与假数据之间的插值数据” 的梯度范数≤1,确保训练稳定。
优势:训练过程几乎不会崩溃,生成的数据多样性更高,且损失值可直接反映生成质量(损失越小,生成质量越好)。
StyleGAN / StyleGAN2(2018/2019)
引入 “风格控制模块”,可分离图像的 “全局风格”(如肤色、光照)和 “局部细节”(如发型、表情),支持风格插值。
适用场景:高分辨率图像生成(如超写实人脸)、风格迁移。核心改进:引入 “风格控制” 机制,让生成器能灵活调整生成图像的 “风格”(如人脸的发型、肤色、表情),同时保持 “内容”(如人脸轮廓)不变:
生成器分为 “映射网络” 和 “合成网络”:映射网络将噪声 z 映射为 “风格向量”,合成网络根据风格向量生成图像;
支持 “细粒度风格控制”:不同层级的特征可赋予不同风格(如低层级控制纹理,高层级控制轮廓)。
优势:生成的高分辨率图像(如 1024×1024 人脸)质量极高,且可解释性强(能手动调整风格),代表作是 NVIDIA 的 “人脸生成模型”(可生成超逼真的虚拟人脸)。
CycleGAN(2017)
设计 “循环一致性损失(Cycle Consistency Loss)”,无需成对数据,实现 “无监督图像翻译”(如马转斑马、夏天转冬天)。
适用场景:无监督图像翻译、风格迁移。核心改进:提出 “循环一致性损失”(Cycle Consistency Loss),解决了 “无监督图像风格迁移” 问题(不需要成对的 “源域 - 目标域” 数据):
包含两个生成器(G: X→Y 把 X 域图像转为 Y 域,F: Y→X 把 Y 域转为 X 域)和两个判别器(D_Y 判别 Y 域图像,D_X 判别 X 域图像);
循环一致性损失要求:F(G(x)) ≈ x(X 域图像转 Y 域后再转回来,应接近原图像),确保迁移后的图像 “内容不变,风格变化”。
应用场景:图像风格迁移(如照片→油画、猫→狗、夏季→冬季)、图像修复、跨域图像转换(如 CT 图像→MRI 图像)。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU, Dropout
from tensorflow.keras.optimizers import Adamdef build_generator(latent_dim=100, output_shape=64):"""构建生成器(Generator):从随机噪声生成 8×8 手写数字(展平为 64 维向量)- latent_dim:随机噪声向量的维度(默认 100 维)- output_shape:输出数据的维度(8×8=64,展平后向量)"""model = Sequential(name="Generator")# 第一层:噪声向量→256 维特征model.add(Dense(256, input_dim=latent_dim))model.add(LeakyReLU(alpha=0.2)) # LeakyReLU 避免梯度消失model.add(Dropout(0.3)) # 防止过拟合# 第二层:256 维→512 维特征model.add(Dense(512))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.3))# 输出层:512 维→64 维(对应 8×8 图像),激活函数用 tanh(输出范围 [-1,1])model.add(Dense(output_shape, activation='tanh'))return modeldef build_discriminator(input_shape=64):"""构建判别器(Discriminator):二分类器,区分“真实手写数字”和“生成器假数据”- input_shape:输入数据的维度(8×8=64,展平后向量)"""model = Sequential(name="Discriminator")# 第一层:64 维输入→512 维特征model.add(Dense(512, input_dim=input_shape))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.3))# 第二层:512 维→256 维特征model.add(Dense(256))model.add(LeakyReLU(alpha=0.2))model.add(Dropout(0.3))# 输出层:256 维→1 维概率(0=假数据,1=真实数据),激活函数用 sigmoidmodel.add(Dense(1, activation='sigmoid'))# 编译判别器(二分类交叉熵损失)model.compile(optimizer=Adam(learning_rate=0.0002, beta_1=0.5), # Adam 优化器,beta_1=0.5 是 GAN 常用参数loss='binary_crossentropy',metrics=['accuracy'])return modeldef build_gan(generator, discriminator):"""构建完整 GAN:固定判别器,训练生成器(让生成器欺骗判别器)- generator:已构建的生成器- discriminator:已构建的判别器"""# 训练 GAN 时,固定判别器的参数(仅更新生成器)discriminator.trainable = False# GAN 流程:噪声 → 生成器生成假数据 → 判别器判断model = Sequential(name="GAN")model.add(generator) # 输入:随机噪声 → 输出:假数据model.add(discriminator) # 输入:假数据 → 输出:判别概率# 编译 GAN(目标:让判别器对假数据的预测接近 1)model.compile(optimizer=Adam(learning_rate=0.0002, beta_1=0.5),loss='binary_crossentropy')return modeldef train_gan(latent_dim=100, epochs=10000, batch_size=64,sample_interval=1000, save_plots=True
):"""完整 GAN 训练流程(封装函数)- latent_dim:随机噪声维度- epochs:训练轮数- batch_size:批次大小- sample_interval:每多少轮生成一次样本并保存- save_plots:是否保存生成的手写数字图像"""# ---------------------- 1. 加载并预处理 sklearn digits 数据集 ----------------------digits = load_digits()X = digits.data # 数据:(1797, 64),每个样本是 8×8 展平后的向量y = digits.target # 标签(此处用不到,GAN 是无监督学习)# 数据归一化:从 [0,16](灰度值范围)映射到 [-1,1](匹配生成器 tanh 输出)scaler = MinMaxScaler(feature_range=(-1, 1))X_scaled = scaler.fit_transform(X)# 划分训练集(GAN 仅用训练集,测试集无需使用)X_train, _ = train_test_split(X_scaled, test_size=0.2, random_state=42)# 真实数据标签:1(判别器认为是“真实数据”);假数据标签:0(判别器认为是“假数据”)real_labels = np.ones((batch_size, 1)) # (batch_size, 1),全 1fake_labels = np.zeros((batch_size, 1)) # (batch_size, 1),全 0# ---------------------- 2. 构建生成器、判别器、GAN ----------------------generator = build_generator(latent_dim=latent_dim, output_shape=X_train.shape[1])discriminator = build_discriminator(input_shape=X_train.shape[1])gan = build_gan(generator, discriminator)# 打印网络结构print("=== 生成器结构 ===")generator.summary()print("\n=== 判别器结构 ===")discriminator.summary()print("\n=== GAN 结构 ===")gan.summary()# ---------------------- 3. 交替训练 GAN ----------------------# 存储损失值,用于后续可视化d_loss_history = [] # 判别器损失g_loss_history = [] # 生成器损失for epoch in range(epochs):# ---------------------- 3.1 训练判别器(区分真实/假数据) ----------------------# 1. 用真实数据训练判别器idx = np.random.randint(0, X_train.shape[0], batch_size) # 随机选 batch_size 个真实样本real_imgs = X_train[idx] # (batch_size, 64)d_loss_real, d_acc_real = discriminator.train_on_batch(real_imgs, real_labels)# 2. 用生成器的假数据训练判别器noise = np.random.normal(0, 1, (batch_size, latent_dim)) # 生成随机噪声fake_imgs = generator.predict(noise, verbose=0) # 生成假数据d_loss_fake, d_acc_fake = discriminator.train_on_batch(fake_imgs, fake_labels)# 计算判别器总损失和准确率d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) # 平均损失d_acc = 0.5 * np.add(d_acc_real, d_acc_fake) # 平均准确率# ---------------------- 3.2 训练生成器(欺骗判别器) ----------------------# 生成新的随机噪声(避免与训练判别器的噪声重复)noise = np.random.normal(0, 1, (batch_size, latent_dim))# 训练生成器:目标是让判别器对假数据的预测为 1(real_labels)g_loss = gan.train_on_batch(noise, real_labels)# ---------------------- 3.3 记录损失并打印日志 ----------------------d_loss_history.append(d_loss)g_loss_history.append(g_loss)# 每 100 轮打印一次训练信息if (epoch + 1) % 100 == 0:print(f"Epoch [{epoch + 1}/{epochs}] | "f"D Loss: {d_loss:.4f} | D Acc: {d_acc:.4f} | "f"G Loss: {g_loss:.4f}")# ---------------------- 3.4 生成样本并保存(按 sample_interval 间隔) ----------------------if (epoch + 1) % sample_interval == 0:generate_samples(generator, latent_dim=latent_dim, epoch=epoch + 1,save_plots=save_plots, n_samples=25 # 生成 25 个样本(5×5 网格))# ---------------------- 4. 训练完成:返回模型和损失历史 ----------------------return {"generator": generator,"discriminator": discriminator,"gan": gan,"d_loss_history": d_loss_history,"g_loss_history": g_loss_history}def generate_samples(generator, latent_dim, epoch, save_plots=True, n_samples=25):"""用训练好的生成器生成手写数字样本,并可视化(可选保存)- generator:训练后的生成器- latent_dim:随机噪声维度- epoch:当前训练轮数(用于文件名)- save_plots:是否保存图像- n_samples:生成样本数量(建议为平方数,如 25=5×5,36=6×6)"""# 生成随机噪声noise = np.random.normal(0, 1, (n_samples, latent_dim))# 生成假数据(8×8 展平向量)generated_imgs = generator.predict(noise, verbose=0)# 归一化到 [0,1](方便显示图像)generated_imgs = (generated_imgs + 1) / 2 # 从 [-1,1] 映射到 [0,1]# 计算网格大小(如 25 个样本 → 5×5 网格)n_rows = int(np.sqrt(n_samples))n_cols = int(np.sqrt(n_samples))# 创建画布fig, axes = plt.subplots(n_rows, n_cols, figsize=(10, 10))axes = axes.flatten() # 展平为一维数组,方便循环# 绘制每个生成的样本for i, ax in enumerate(axes):# 重塑为 8×8 图像(生成器输出是 64 维向量)img = generated_imgs[i].reshape(8, 8)ax.imshow(img, cmap='gray') # 灰度显示ax.axis('off') # 隐藏坐标轴# 标题(含当前训练轮数)fig.suptitle(f"Generated Digits (Epoch {epoch})", fontsize=16)# 保存或显示图像if save_plots:plt.savefig(f"gan_generated_digits_epoch_{epoch}.png", dpi=300, bbox_inches='tight')print(f"Generated samples saved as 'gan_generated_digits_epoch_{epoch}.png'")else:plt.show()plt.close()# ------------------- 调用示例:训练 GAN 并生成手写数字 -------------------
if __name__ == "__main__":# 调用封装的训练函数gan_results = train_gan(latent_dim=100, # 100 维随机噪声epochs=10000, # 训练 10000 轮(足够生成清晰数字)batch_size=64, # 批次大小 64sample_interval=1000, # 每 1000 轮保存一次样本save_plots=True # 保存生成的图像)# 可选:绘制损失曲线(查看训练稳定性)plt.figure(figsize=(12, 6))plt.plot(gan_results["d_loss_history"], label="Discriminator Loss")plt.plot(gan_results["g_loss_history"], label="Generator Loss")plt.xlabel("Epochs")plt.ylabel("Loss")plt.title("GAN Training Loss History")plt.legend()plt.savefig("gan_loss_history.png", dpi=300, bbox_inches='tight')print("Loss history plot saved as 'gan_loss_history.png'")