当前位置：首页 > news >正文

「日拱一码」023 机器学习——超参数优化

news 2025/7/11 6:34:22

目录

网格搜索（Grid Search）

随机搜索（Random Search）

贝叶斯优化（Bayesian Optimization）

遗传算法（Genetic Algorithm）

基于梯度的优化（Gradient-Based Optimization）

超参数优化框架（Hyperopt）

自动化超参数优化工具 Optuna

超参数优化是机器学习中非常重要的一步，它可以帮助我们找到模型的最佳性能配置。以下是对常见超参数优化方法的介绍：

网格搜索（Grid Search）

原理：通过遍历所有可能的超参数组合来找到最佳配置
优点：简单易实现，能够确保找到给定网格中的最优解
缺点：计算开销大，尤其是超参数空间较大时

## 基于网格和随机采样的方法
# 1. 网格搜索(Grid Search)from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split# 加载数据
data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)# 定义模型
tree = DecisionTreeClassifier(random_state=42)# 定义超参数网格
param_grid = {'max_depth': [3, 5, 10, None],'min_samples_split': [2, 5, 10]
}# 设置网格搜索
grid_search = GridSearchCV(tree, param_grid, cv=3, n_jobs=-1)
grid_search.fit(X_train, y_train)# 输出最佳参数和得分
print("Best Parameters: ", grid_search.best_params_)  # {'max_depth': 3, 'min_samples_split': 2}
print("Best Cross-validation Score: ", grid_search.best_score_)  # 0.9238095238095237
print("Test Score: ", grid_search.score(X_test, y_test))  # 1.0

随机搜索（Random Search）

原理：从超参数空间中随机选择一些组合进行评估
优点：计算开销小，适合超参数空间较大的情况
缺点：不能保证找到最优解

# 2. 随机搜索(Random Search)from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
from scipy.stats import uniform# 定义参数空间
param_dist = {'C':uniform(0.1, 10),'gamma':uniform(0.001, 1)
}# 设置随机搜索
random_search = RandomizedSearchCV(SVC(), param_distributions=param_dist, n_iter=100, cv=3, random_state=42)
random_search.fit(X_train, y_train)# 输出最佳参数和得分
print("Best Parameters: ", random_search.best_params_)  # {'C': 8.424426408004217, 'gamma': 0.21333911067827616}
print("Best Cross-validation Score: ", random_search.best_score_)  # 0.9523809523809524
print("Test Score: ", random_search.score(X_test, y_test))  # 1.0

贝叶斯优化（Bayesian Optimization）

原理：通过建立一个代理模型（如高斯过程）来预测超参数的性能，并选择最有可能提高性能的超参数组合进行评估
优点：效率高，能够减少评估次数
缺点：实现复杂，需要构建和调整代理模型

## 基于概率模型的方法
# 贝叶斯优化(Bayesian Optimization)from skopt import BayesSearchCV# 定义贝叶斯优化的参数空间
param_space = {'C': (0.1, 10.0),'gamma': (0.001, 1.0)
}# 设置贝叶斯优化
opt = BayesSearchCV(SVC(), param_space, n_iter=50, cv=3, random_state=42)
opt.fit(X_train, y_train)# 输出最佳参数和得分
print("Best Parameters: ", opt.best_params_)  # OrderedDict([('C', 9.273859960865279), ('gamma', 0.20326667084147684)])
print("Best Cross-validation Score: ", opt.best_score_)  # 0.9523809523809524
print("Test Score: ", opt.score(X_test, y_test))  # 1.0

遗传算法（Genetic Algorithm）

原理：模拟自然选择过程，通过选择、交叉和变异操作来优化超参数
优点：适合复杂的超参数空间，能够避免局部最优
缺点：实现复杂，计算开销较大

# 基于启发式搜索的方法
# 遗传算法(Genetic Algorithm)from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Integer, Categorical, Continuous# 定义模型
model = RandomForestClassifier(random_state=42)# 定义参数空间
param_grid = {'n_estimators': Integer(10, 500),'max_depth': Integer(3, 20),'criterion': Categorical(['gini', 'entropy'])
}# 设置遗传算法优化
evolved_search = GASearchCV(estimator=model,cv=3,scoring='accuracy',param_grid=param_grid,population_size=50,generations=100,tournament_size=5,elitism=True)# 训练模型
evolved_search.fit(X_train, y_train)# 输出最佳参数和得分
print("Best Parameters: ", evolved_search.best_params_)  # {'n_estimators': 13, 'max_depth': 7, 'criterion': 'gini'}
print("Best Cross-validation Score: ", evolved_search.best_score_)  # 0.9619047619047619
print("Test Score: ", evolved_search.score(X_test, y_test))   # 1.0

基于梯度的优化（Gradient-Based Optimization）

原理：通过计算超参数的梯度来更新超参数值，通常用于神经网络的超参数优化
优点：优化速度快，适合连续的超参数空间
缺点：依赖于梯度信息，可能对初始值敏感，容易陷入局部最优

## 基于梯度的优化(Gradient-Based Optimization)import torch
import torch.nn as nn
import torch.optim as optim
from ray import tune
from ray.tune.schedulers import ASHAScheduler# 定义简单的神经网络
class SimpleNN(nn.Module):def __init__(self, hidden_dim):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(4, hidden_dim)self.fc2 = nn.Linear(hidden_dim, 3)def forward(self, x):x = torch.relu(self.fc1(x))x = self.fc2(x)return xdef train(config):model = SimpleNN(config["hidden_dim"])criterion = nn.CrossEntropyLoss()optimizer = optim.Adam(model.parameters(), lr=config["lr"])for epoch in range(100):optimizer.zero_grad()outputs = model(X_train_tensor)loss = criterion(outputs, y_train_tensor)loss.backward()optimizer.step()if epoch % 10 == 0:with torch.no_grad():val_outputs = model(X_test_tensor)val_loss = criterion(val_outputs, y_test_tensor)tune.report(loss=val_loss.item())# 将数据转换为 PyTorch 张量
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)# 定义超参数空间
config = {"hidden_dim": tune.choice([10, 20, 30]),"lr": tune.loguniform(1e-4, 1e-1)
}# 设置调度器
scheduler = ASHAScheduler(metric="loss",mode="min",max_t=100,grace_period=10,reduction_factor=2
)# 启动超参数优化
analysis = tune.run(train,config=config,scheduler=scheduler,num_samples=10
)# 输出最佳参数和得分
best_trial = analysis.get_best_trial("loss", "min", "last")
print("Best Parameters: ", best_trial.config)
print("Best Loss: ", best_trial.last_result["loss"])

超参数优化框架（Hyperopt）

原理：通过定义超参数空间和目标函数，使用贝叶斯优化或其他优化算法来搜索最优超参数
优点：支持多种优化算法，灵活性高
缺点：需要定义目标函数，可能需要较多的计算资源

## 超参数优化框架 (Hyperopt)from hyperopt import fmin, tpe, hp, Trials, STATUS_OK
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier# 定义目标函数
def objective(params):model = RandomForestClassifier(**params)score = cross_val_score(model, X_train, y_train, cv=3, scoring='accuracy').mean()return {'loss': -score, 'status': STATUS_OK}# 定义超参数空间
param_space = {'n_estimators': hp.choice('n_estimators', range(10, 500)),'max_depth': hp.choice('max_depth', range(3, 20)),'criterion': hp.choice('criterion', ['gini', 'entropy'])
}# 运行优化
trials = Trials()
best = fmin(fn=objective,space=param_space,algo=tpe.suggest,max_evals=100,trials=trials)# 输出最佳参数
best_params = {'n_estimators': best['n_estimators'],'max_depth': best['max_depth'],'criterion': ['gini', 'entropy'][best['criterion']]
}
print("Best Parameters: ", best_params)  # {'n_estimators': 15, 'max_depth': 0, 'criterion': 'entropy'}

自动化超参数优化工具 Optuna

原理：一个轻量级的超参数优化框架，支持多种优化算法，包括随机搜索、贝叶斯优化等
优点：易于使用，支持并行优化
缺点：需要安装额外的库

## 自动化超参数优化工具Optunaimport optuna
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier# 定义目标函数
def objective(trial):params = {'n_estimators': trial.suggest_int('n_estimators', 10, 500),'max_depth': trial.suggest_int('max_depth', 3, 20),'criterion': trial.suggest_categorical('criterion', ['gini', 'entropy'])}model = RandomForestClassifier(**params)score = cross_val_score(model, X_train, y_train, cv=3, scoring='accuracy').mean()return score# 运行优化
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)# 输出最佳参数和得分
best_params = study.best_params
best_score = study.best_value
print("Best Parameters: ", best_params)  # {'n_estimators': 158, 'max_depth': 13, 'criterion': 'gini'}
print("Best Cross-validation Score: ", best_score)  # 0.9523809523809524

http://www.dtcms.com/a/272174.html

相关文章：

判断当前是否为钉钉环境

【Pandas】pandas DataFrame from_dict

1.2.3_1 OSI参考模型

STM32F103C8T6单片机内部执行原理及启动流程详解

vue3实现pdf文件预览 - vue-pdf-embed

力扣热门算法题 127.单词接龙，128.最长连续序列，130.被围绕的区域

MySQL数据库基础教程：从安装到数据操作

快速合并多个CAD图形为单一PDF文档的方法

常见 Docker 错误及解决方法

(vue)前端区分接口返回两种格式，一种Blob二进制字节流，一种常规JSON，且将blob响应转为json

基于Catboost算法的茶叶数据分析及价格预测系统的设计与实现

多元函数的切平面与线性近似：几何直观与计算方法

高数附录（1）—常用平面图形

《O-PAS™标准的安全方法》白皮书：为工业自动化系统筑起安全防线

msf复现永恒之蓝

每日一SQL 【各赛事的用户注册率】

Datawhale AI 夏令营：基于带货视频评论的用户洞察挑战赛 Notebook（下篇）

流媒体服务

SIMATIC S7-1200的以太网通信能力：协议与资源详细解析

x86架构CPU市场格局

WIFI协议全解析05:WiFi的安全机制：IoT设备如何实现安全连接？

PHP安全编程实践系列（三）：安全会话管理与防护策略

【运维】串口、网络一些基本信息

【超详细】CentOS系统Docker安装与配置一键脚本（附镜像加速配置）

Pinia 笔记：Vue3 状态管理库

双模秒切，体验跃迁！飞利浦EVNIA双模游戏显示器27M2N6801M王者降临！

UnrealEngine5游戏引擎实践（C++)

如何将多个.sql文件合并成一个：Windows和Linux/Mac详细指南

字节 Seed 团队联合清华大学智能产业研究院开源 MemAgent: 基于多轮对话强化学习记忆代理的长文本大语言模型重构

为了安全应该使用非root用户启动nginx