当前位置：首页 > news >正文

使用MLflow跟踪和管理你的机器学习实验

news 2025/10/19 11:23:07

使用MLflow跟踪和管理你的机器学习实验
- 1. 引言
- 2. MLflow简介
- - 2.1 MLflow组件概述
  - 2.2 MLflow的核心价值
- 3. MLflow的核心组件
- - 3.1 MLflow Tracking
  - 3.2 MLflow Projects
  - 3.3 MLflow Models
  - 3.4 MLflow Model Registry
- 4. 安装和设置
- - 4.1 安装MLflow
  - 4.2 启动MLflow UI
- 5. 使用MLflow进行实验跟踪
- - 5.1 基础设置和导入
  - 5.2 数据准备和探索
  - 5.3 简单的MLflow实验
  - 5.4 超参数调优实验
  - 5.5 特征重要性分析
- 6. 高级MLflow功能
- - 6.1 嵌套运行
  - 6.2 自动日志记录
- 7. 模型部署和服务
- - 7.1 保存和加载模型
  - 7.2 模型服务
- 8. 实验结果分析和比较
- - 8.1 实验比较工具
- 9. 完整代码实现
- 10. 代码自查和优化
- - 10.1 代码质量检查
  - 10.2 性能优化
  - 10.3 可维护性改进
- 11. 总结
- - 11.1 主要收获
  - 11.2 最佳实践
  - 11.3 未来展望

『宝藏代码胶囊开张啦！』—— 我的 CodeCapsule 来咯！✨
写代码不再头疼！我的新站点 CodeCapsule 主打一个 “白菜价”+“量身定制”！无论是卡脖子的毕设/课设/文献复现，需要灵光一现的算法改进，还是想给项目加个“外挂”，这里都有便宜又好用的代码方案等你发现！低成本，高适配，助你轻松通关！速来围观 👉 CodeCapsule官网

使用MLflow跟踪和管理你的机器学习实验

1. 引言

在机器学习的实践中，我们经常面临一个严峻的挑战：实验管理的复杂性。随着项目的发展，我们会尝试不同的算法、超参数、特征工程方法，产生大量的实验运行记录。如果没有一个系统化的方法来跟踪这些实验，很容易陷入混乱：

忘记了哪个超参数组合产生了最佳结果
无法复现之前的实验结果
难以比较不同方法的性能差异
团队成员之间缺乏统一的实验记录方式

MLflow 应运而生，它是一个开源的机器学习生命周期管理平台，专门为解决这些问题而设计。MLflow提供了一套完整的工具，帮助数据科学家和工程师跟踪实验、打包代码、共享模型以及管理模型部署。

本文将深入探讨如何使用MLflow来跟踪和管理机器学习实验，通过实际的Python代码示例，展示MLflow的核心功能和使用方法。

2. MLflow简介

MLflow由Databricks公司开发，目前已成为机器学习工作流管理的行业标准工具之一。它包含四个主要组件：

2.1 MLflow组件概述

2.2 MLflow的核心价值

MLflow解决了机器学习项目中的几个关键问题：

实验追踪：自动记录参数、指标、代码版本和输出文件
可复现性：确保任何实验都可以被准确复现
模型管理：提供统一的模型存储、版本控制和部署方案
协作支持：便于团队成员之间的知识共享和协作

3. MLflow的核心组件

3.1 MLflow Tracking

MLflow Tracking是一个用于记录机器学习实验的API和UI。它可以自动记录：

参数：算法的配置参数
指标：评估指标如准确率、损失等
** artifacts**：输出文件如模型、图表等
代码版本：Git commit hash
环境信息：Python版本、依赖库等

3.2 MLflow Projects

MLflow Projects提供了一种标准化的格式来打包可重用的数据科学代码。每个项目包含：

入口点：可执行的函数或脚本
环境配置：conda.yaml或Dockerfile
依赖管理：明确指定代码依赖

3.3 MLflow Models

MLflow Models提供了一种标准格式来打包机器学习模型，支持多种部署方式：

REST API：将模型部署为Web服务
批处理：用于批量预测
实时推理：集成到实时应用中

3.4 MLflow Model Registry

MLflow Model Registry是一个集中式的模型存储、版本管理和阶段转换系统：

版本控制：跟踪模型的多个版本
阶段管理：Staging、Production、Archived等阶段
注释和描述：为模型添加元数据

4. 安装和设置

4.1 安装MLflow

pip install mlflow

对于完整功能，建议安装额外的依赖：

pip install mlflow[extras]

4.2 启动MLflow UI

MLflow提供了一个Web界面来查看和比较实验：

mlflow ui --host 0.0.0.0 --port 5000

访问 http://localhost:5000 即可查看MLflow的Web界面。

5. 使用MLflow进行实验跟踪

让我们通过一个完整的机器学习项目来演示MLflow的使用。我们将使用经典的鸢尾花数据集，构建一个分类模型。

5.1 基础设置和导入

import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')# 设置随机种子以确保可复现性
np.random.seed(42)

5.2 数据准备和探索

def load_and_prepare_data():"""加载和准备鸢尾花数据集"""# 加载数据iris = load_iris()X = iris.datay = iris.targetfeature_names = iris.feature_namestarget_names = iris.target_names# 创建DataFrame以便于分析df = pd.DataFrame(X, columns=feature_names)df['target'] = ydf['target_name'] = [target_names[i] for i in y]# 数据基本信息print("数据集形状:", df.shape)print("\n特征名称:", feature_names)print("\n目标类别:", target_names)print("\n数据统计描述:")print(df.describe())return X, y, feature_names, target_names, df# 加载数据
X, y, feature_names, target_names, df = load_and_prepare_data()# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y
)print(f"训练集大小: {X_train.shape[0]}")
print(f"测试集大小: {X_test.shape[0]}")

5.3 简单的MLflow实验

让我们从一个简单的实验开始，记录基本的模型训练过程：

def simple_experiment():"""简单的MLflow实验示例"""# 设置实验名称mlflow.set_experiment("Iris_Classification_Basic")# 开始MLflow运行with mlflow.start_run(run_name="Basic_RandomForest"):# 记录参数n_estimators = 100max_depth = 5random_state = 42mlflow.log_param("n_estimators", n_estimators)mlflow.log_param("max_depth", max_depth)mlflow.log_param("random_state", random_state)# 创建并训练模型model = RandomForestClassifier(n_estimators=n_estimators,max_depth=max_depth,random_state=random_state)model.fit(X_train, y_train)# 预测和评估y_pred = model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)# 记录指标mlflow.log_metric("accuracy", accuracy)# 记录模型mlflow.sklearn.log_model(model, "random_forest_model")# 生成并记录混淆矩阵cm = confusion_matrix(y_test, y_pred)plt.figure(figsize=(8, 6))sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',xticklabels=target_names, yticklabels=target_names)plt.title('Confusion Matrix')plt.ylabel('True Label')plt.xlabel('Predicted Label')# 保存图片并记录到MLflowplt.savefig("confusion_matrix.png")mlflow.log_artifact("confusion_matrix.png")plt.close()print(f"模型准确率: {accuracy:.4f}")return accuracy, model# 运行简单实验
accuracy, model = simple_experiment()

5.4 超参数调优实验

现在让我们进行更复杂的实验，使用网格搜索进行超参数调优：

def hyperparameter_tuning_experiment():"""超参数调优实验"""# 设置实验mlflow.set_experiment("Iris_Hyperparameter_Tuning")# 定义参数网格param_grid = {'n_estimators': [50, 100, 200],'max_depth': [3, 5, 7, None],'min_samples_split': [2, 5, 10],'min_samples_leaf': [1, 2, 4]}# 开始MLflow运行with mlflow.start_run(run_name="GridSearch_RandomForest"):# 记录参数网格mlflow.log_params({'param_grid_n_estimators': str(param_grid['n_estimators']),'param_grid_max_depth': str(param_grid['max_depth']),'param_grid_min_samples_split': str(param_grid['min_samples_split']),'param_grid_min_samples_leaf': str(param_grid['min_samples_leaf'])})# 网格搜索grid_search = GridSearchCV(RandomForestClassifier(random_state=42),param_grid,cv=5,scoring='accuracy',n_jobs=-1,verbose=1)grid_search.fit(X_train, y_train)# 记录最佳参数best_params = grid_search.best_params_for param, value in best_params.items():mlflow.log_param(f"best_{param}", value)# 使用最佳模型进行预测best_model = grid_search.best_estimator_y_pred = best_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)# 记录指标mlflow.log_metric("best_accuracy", accuracy)mlflow.log_metric("best_cv_score", grid_search.best_score_)# 记录模型mlflow.sklearn.log_model(best_model, "best_random_forest_model")# 生成详细分类报告report = classification_report(y_test, y_pred, target_names=target_names, output_dict=True)# 记录每个类别的指标for class_name in target_names:mlflow.log_metric(f"precision_{class_name}", report[class_name]['precision'])mlflow.log_metric(f"recall_{class_name}", report[class_name]['recall'])mlflow.log_metric(f"f1_{class_name}", report[class_name]['f1-score'])# 记录加权平均指标mlflow.log_metric("weighted_avg_precision", report['weighted avg']['precision'])mlflow.log_metric("weighted_avg_recall", report['weighted avg']['recall'])mlflow.log_metric("weighted_avg_f1", report['weighted avg']['f1-score'])print(f"最佳参数: {best_params}")print(f"最佳交叉验证分数: {grid_search.best_score_:.4f}")print(f"测试集准确率: {accuracy:.4f}")return best_model, grid_search.best_score_, accuracy# 运行超参数调优实验
best_model, cv_score, test_accuracy = hyperparameter_tuning_experiment()

5.5 特征重要性分析

def feature_importance_analysis(model, feature_names):"""分析特征重要性并记录到MLflow"""# 开始新的运行记录特征重要性with mlflow.start_run(run_name="Feature_Importance_Analysis", nested=True):# 获取特征重要性importances = model.feature_importances_indices = np.argsort(importances)[::-1]# 记录每个特征的重要性for i, idx in enumerate(indices):mlflow.log_metric(f"feature_importance_rank_{i+1}", importances[idx])mlflow.log_param(f"feature_rank_{i+1}", feature_names[idx])# 创建特征重要性图表plt.figure(figsize=(10, 6))plt.title("Feature Importances")plt.bar(range(len(importances)), importances[indices])plt.xticks(range(len(importances)), [feature_names[i] for i in indices], rotation=45)plt.tight_layout()# 保存并记录图表plt.savefig("feature_importance.png")mlflow.log_artifact("feature_importance.png")plt.close()print("特征重要性排序:")for i, idx in enumerate(indices):print(f"{i+1}. {feature_names[idx]}: {importances[idx]:.4f}")# 分析特征重要性
feature_importance_analysis(best_model, feature_names)

6. 高级MLflow功能

6.1 嵌套运行

MLflow支持嵌套运行，这对于组织复杂的实验特别有用：

def nested_experiments():"""演示嵌套运行的使用"""mlflow.set_experiment("Iris_Nested_Experiments")with mlflow.start_run(run_name="Parent_Run"):mlflow.log_param("parent_param", "parent_value")# 子运行1：不同的预处理方法with mlflow.start_run(run_name="StandardScaler_Preprocessing", nested=True):scaler = StandardScaler()X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test)model = RandomForestClassifier(n_estimators=100, random_state=42)model.fit(X_train_scaled, y_train)accuracy_scaled = accuracy_score(y_test, model.predict(X_test_scaled))mlflow.log_metric("accuracy_with_scaling", accuracy_scaled)mlflow.sklearn.log_model(scaler, "scaler")mlflow.sklearn.log_model(model, "model_with_scaling")# 子运行2：原始数据with mlflow.start_run(run_name="No_Preprocessing", nested=True):model = RandomForestClassifier(n_estimators=100, random_state=42)model.fit(X_train, y_train)accuracy_raw = accuracy_score(y_test, model.predict(X_test))mlflow.log_metric("accuracy_no_scaling", accuracy_raw)mlflow.sklearn.log_model(model, "model_no_scaling")# 比较结果improvement = accuracy_scaled - accuracy_rawmlflow.log_metric("scaling_improvement", improvement)print(f"使用标准化的准确率: {accuracy_scaled:.4f}")print(f"不使用标准化的准确率: {accuracy_raw:.4f}")print(f"改进: {improvement:.4f}")# 运行嵌套实验
nested_experiments()

6.2 自动日志记录

MLflow提供了自动日志记录功能，可以自动跟踪模型参数和指标：

def autolog_experiment():"""使用MLflow的自动日志记录功能"""# 启用sklearn的自动日志记录mlflow.sklearn.autolog()mlflow.set_experiment("Iris_Autolog_Experiment")with mlflow.start_run(run_name="Autolog_RandomForest"):# 自动记录参数、指标和模型model = RandomForestClassifier(n_estimators=150,max_depth=8,min_samples_split=5,random_state=42)model.fit(X_train, y_train)# 手动记录额外的指标y_pred = model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)mlflow.log_metric("manual_accuracy", accuracy)print("自动日志记录完成！")return model# 禁用自动日志记录以避免影响其他实验
mlflow.sklearn.autolog(disable=True)# 运行自动日志记录实验
autolog_model = autolog_experiment()

7. 模型部署和服务

7.1 保存和加载模型

def save_and_load_model_demo():"""演示如何保存和加载MLflow模型"""mlflow.set_experiment("Iris_Model_Management")with mlflow.start_run(run_name="Model_Save_Load_Demo"):# 训练模型model = RandomForestClassifier(n_estimators=100, random_state=42)model.fit(X_train, y_train)# 记录模型mlflow.sklearn.log_model(model, "model")# 获取模型URImodel_uri = mlflow.get_artifact_uri("model")print(f"模型URI: {model_uri}")# 加载模型进行预测loaded_model = mlflow.sklearn.load_model(model_uri)# 验证加载的模型original_predictions = model.predict(X_test[:5])loaded_predictions = loaded_model.predict(X_test[:5])print("原始模型预测:", original_predictions)print("加载模型预测:", loaded_predictions)print("预测结果一致:", np.array_equal(original_predictions, loaded_predictions))return model_uri# 演示模型保存和加载
model_uri = save_and_load_model_demo()

7.2 模型服务

MLflow模型可以轻松部署为REST API服务：

def create_model_signature():"""创建模型签名，定义输入输出格式"""from mlflow.models.signature import infer_signaturefrom mlflow.types.schema import Schema, ColSpecfrom mlflow.types.schema import DataType# 推断签名signature = infer_signature(X_train, model.predict(X_train))# 或者手动创建签名input_schema = Schema([ColSpec(DataType.double, "sepal length (cm)"),ColSpec(DataType.double, "sepal width (cm)"),ColSpec(DataType.double, "petal length (cm)"),ColSpec(DataType.double, "petal width (cm)")])output_schema = Schema([ColSpec(DataType.integer, "class")])signature = mlflow.models.signature.ModelSignature(inputs=input_schema, outputs=output_schema)return signaturedef register_model_demo():"""演示模型注册"""mlflow.set_experiment("Iris_Model_Registry")with mlflow.start_run(run_name="Model_Registration_Demo"):# 训练模型model = RandomForestClassifier(n_estimators=100, random_state=42)model.fit(X_train, y_train)# 创建签名signature = create_model_signature()# 记录模型（带签名）mlflow.sklearn.log_model(model, "model",signature=signature,input_example=X_train[:5]  # 提供输入示例)# 注册模型到模型注册表model_uri = mlflow.get_artifact_uri("model")# 在实际环境中，您可以使用以下命令注册模型：# mlflow.register_model(model_uri, "Iris_RandomForest_Model")print("模型已准备好注册!")print(f"模型URI: {model_uri}")return model_uri# 演示模型注册
registered_model_uri = register_model_demo()

8. 实验结果分析和比较

8.1 实验比较工具

MLflow提供了强大的工具来比较不同实验的运行结果：

def analyze_experiment_results():"""分析实验结果的工具函数"""# 获取当前实验的所有运行experiment = mlflow.get_experiment_by_name("Iris_Hyperparameter_Tuning")runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id])print("实验运行统计:")print(f"总运行次数: {len(runs)}")# 找到最佳运行best_run = runs.loc[runs['metrics.best_accuracy'].idxmax()]print(f"\n最佳运行ID: {best_run['run_id']}")print(f"最佳准确率: {best_run['metrics.best_accuracy']:.4f}")print(f"最佳参数:")# 提取最佳参数param_columns = [col for col in runs.columns if col.startswith('params.')]for param_col in param_columns:if 'best_' in param_col and pd.notna(best_run[param_col]):param_name = param_col.replace('params.best_', '')print(f"  {param_name}: {best_run[param_col]}")# 创建结果比较图表plt.figure(figsize=(12, 6))# 准确率分布plt.subplot(1, 2, 1)plt.hist(runs['metrics.best_accuracy'].dropna(), bins=10, alpha=0.7, color='skyblue')plt.axvline(best_run['metrics.best_accuracy'], color='red', linestyle='--', label='最佳准确率')plt.xlabel('准确率')plt.ylabel('频次')plt.title('准确率分布')plt.legend()# 参数与准确率的关系plt.subplot(1, 2, 2)if 'params.best_max_depth' in runs.columns:# 处理可能的NaN值valid_runs = runs.dropna(subset=['params.best_max_depth', 'metrics.best_accuracy'])max_depths = valid_runs['params.best_max_depth'].astype(str)accuracies = valid_runs['metrics.best_accuracy']# 分组统计depth_groups = {}for depth, acc in zip(max_depths, accuracies):if depth not in depth_groups:depth_groups[depth] = []depth_groups[depth].append(acc)# 绘制箱线图plt.boxplot([depth_groups[depth] for depth in sorted(depth_groups.keys())], labels=sorted(depth_groups.keys()))plt.xlabel('最大深度')plt.ylabel('准确率')plt.title('不同最大深度的准确率分布')plt.tight_layout()plt.savefig('experiment_analysis.png')plt.show()return best_run# 分析实验结果
best_run_info = analyze_experiment_results()

9. 完整代码实现

下面是本文中使用的完整代码，包含了所有必要的导入和函数定义：

# -*- coding: utf-8 -*-
"""
使用MLflow跟踪和管理机器学习实验的完整示例
作者: AI助手
日期: 2024年
"""import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os# 设置中文字体和忽略警告
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
warnings.filterwarnings('ignore')# 设置随机种子
np.random.seed(42)class MLflowIrisExperiment:"""使用MLflow进行鸢尾花分类实验的完整类"""def __init__(self):"""初始化实验"""self.X = Noneself.y = Noneself.feature_names = Noneself.target_names = Noneself.df = Noneself.X_train = Noneself.X_test = Noneself.y_train = Noneself.y_test = Nonedef load_data(self):"""加载和准备数据"""iris = load_iris()self.X = iris.dataself.y = iris.targetself.feature_names = iris.feature_namesself.target_names = iris.target_names# 创建DataFrameself.df = pd.DataFrame(self.X, columns=self.feature_names)self.df['target'] = self.yself.df['target_name'] = [self.target_names[i] for i in self.y]# 划分训练测试集self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.3, random_state=42, stratify=self.y)print("数据加载完成!")print(f"训练集大小: {self.X_train.shape[0]}")print(f"测试集大小: {self.X_test.shape[0]}")def basic_experiment(self):"""基础实验"""mlflow.set_experiment("Iris_Classification_Basic")with mlflow.start_run(run_name="Basic_RandomForest"):# 参数params = {'n_estimators': 100,'max_depth': 5,'random_state': 42}# 记录参数mlflow.log_params(params)# 训练模型model = RandomForestClassifier(**params)model.fit(self.X_train, self.y_train)# 评估模型y_pred = model.predict(self.X_test)accuracy = accuracy_score(self.y_test, y_pred)# 记录指标mlflow.log_metric("accuracy", accuracy)# 记录模型mlflow.sklearn.log_model(model, "model")# 记录混淆矩阵self._log_confusion_matrix(y_pred, "basic_confusion_matrix.png")print(f"基础实验准确率: {accuracy:.4f}")return accuracy, modeldef hyperparameter_tuning(self):"""超参数调优实验"""mlflow.set_experiment("Iris_Hyperparameter_Tuning")with mlflow.start_run(run_name="GridSearch_Optimization"):# 参数网格param_grid = {'n_estimators': [50, 100, 200],'max_depth': [3, 5, 7, None],'min_samples_split': [2, 5, 10],'min_samples_leaf': [1, 2, 4]}# 记录参数网格mlflow.log_param("param_grid", str(param_grid))# 网格搜索grid_search = GridSearchCV(RandomForestClassifier(random_state=42),param_grid,cv=5,scoring='accuracy',n_jobs=-1,verbose=0)grid_search.fit(self.X_train, self.y_train)# 记录最佳参数best_params = grid_search.best_params_for param, value in best_params.items():mlflow.log_param(f"best_{param}", value)# 评估最佳模型best_model = grid_search.best_estimator_y_pred = best_model.predict(self.X_test)accuracy = accuracy_score(self.y_test, y_pred)# 记录指标mlflow.log_metric("best_accuracy", accuracy)mlflow.log_metric("best_cv_score", grid_search.best_score_)# 记录详细分类报告self._log_classification_report(y_pred)# 记录模型mlflow.sklearn.log_model(best_model, "best_model")print(f"调优实验准确率: {accuracy:.4f}")return best_model, accuracydef _log_confusion_matrix(self, y_pred, filename):"""记录混淆矩阵"""cm = confusion_matrix(self.y_test, y_pred)plt.figure(figsize=(8, 6))sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',xticklabels=self.target_names, yticklabels=self.target_names)plt.title('Confusion Matrix')plt.ylabel('True Label')plt.xlabel('Predicted Label')plt.tight_layout()plt.savefig(filename)mlflow.log_artifact(filename)plt.close()# 清理临时文件if os.path.exists(filename):os.remove(filename)def _log_classification_report(self, y_pred):"""记录分类报告"""report = classification_report(self.y_test, y_pred, target_names=self.target_names, output_dict=True)# 记录每个类别的指标for class_name in self.target_names:if class_name in report:mlflow.log_metric(f"precision_{class_name}", report[class_name]['precision'])mlflow.log_metric(f"recall_{class_name}", report[class_name]['recall'])mlflow.log_metric(f"f1_{class_name}", report[class_name]['f1-score'])# 记录整体指标mlflow.log_metric("weighted_avg_precision", report['weighted avg']['precision'])mlflow.log_metric("weighted_avg_recall", report['weighted avg']['recall'])mlflow.log_metric("weighted_avg_f1", report['weighted avg']['f1-score'])def feature_analysis(self, model):"""特征重要性分析"""with mlflow.start_run(run_name="Feature_Analysis"):# 获取特征重要性importances = model.feature_importances_indices = np.argsort(importances)[::-1]# 记录特征重要性for i, idx in enumerate(indices):mlflow.log_metric(f"feature_importance_rank_{i+1}", importances[idx])mlflow.log_param(f"feature_name_rank_{i+1}", self.feature_names[idx])# 创建特征重要性图表plt.figure(figsize=(10, 6))plt.bar(range(len(importances)), importances[indices])plt.xticks(range(len(importances)), [self.feature_names[i] for i in indices], rotation=45)plt.title("Feature Importances")plt.tight_layout()plt.savefig("feature_importance.png")mlflow.log_artifact("feature_importance.png")plt.close()# 清理临时文件if os.path.exists("feature_importance.png"):os.remove("feature_importance.png")print("特征重要性分析完成!")def run_complete_pipeline(self):"""运行完整的实验管道"""print("开始MLflow实验管道...")# 1. 加载数据self.load_data()# 2. 基础实验print("\n1. 运行基础实验...")basic_accuracy, basic_model = self.basic_experiment()# 3. 超参数调优print("\n2. 运行超参数调优...")best_model, tuned_accuracy = self.hyperparameter_tuning()# 4. 特征分析print("\n3. 进行特征分析...")self.feature_analysis(best_model)# 5. 结果比较improvement = tuned_accuracy - basic_accuracyprint(f"\n实验结果总结:")print(f"基础模型准确率: {basic_accuracy:.4f}")print(f"调优模型准确率: {tuned_accuracy:.4f}")print(f"准确率提升: {improvement:.4f}")return basic_accuracy, tuned_accuracy, improvementdef main():"""主函数"""# 创建实验实例experiment = MLflowIrisExperiment()# 运行完整管道basic_acc, tuned_acc, improvement = experiment.run_complete_pipeline()print("\n" + "="*50)print("实验完成!")print("="*50)print("请运行以下命令查看MLflow UI:")print("mlflow ui --host 0.0.0.0 --port 5000")print("然后访问: http://localhost:5000")if __name__ == "__main__":main()