当前位置：首页 > news >正文

吴恩达机器学习课程（PyTorch适配）学习笔记：1.1 基础模型与数学原理

news 2025/10/8 7:06:45

1.1.1 线性回归模型

线性回归是监督学习中最基础的回归任务模型，核心思想是通过构建“输入特征与输出标签的线性关系”，实现对连续值输出的预测（如房价、销量、温度等）。

1. 应用场景

典型场景：根据房屋面积（输入特征）预测房屋价格（输出标签）、根据广告投入（输入）预测产品销量（输出）等。以“单变量房价预测”为例展开讲解。

2. 模型定义（单变量场景）

假设我们的输入特征只有1个（如房屋面积，记为 $x$ ），输出标签为连续值（如房价，记为 $y$ ），线性回归模型的假设函数（即预测公式）定义为：
$hθ(x)=θ0+θ1xh_\theta(x) = \theta_0 + \theta_1 x$

$hθ(x)h_\theta(x)$ ：模型对输入 $x$ 的预测值（如“面积为 $x$ 的房屋，预测房价为 $hθ(x)h_\theta(x)$ ”）；
$θ0\theta_0$ ：偏置项（截距），表示当 $x = 0$ 时的基础预测值；
$θ1\theta_1$ ：特征系数（斜率），表示输入 $x$ 每增加1个单位，输出 $y$ 平均变化 $θ1\theta_1$ 个单位；
$x$ ：输入特征（自变量）， $y$ ：真实标签（因变量）。

3. 核心目标

线性回归的目标是找到一组最优的参数 $θ=[θ0,θ1]T\theta = [\theta_0, \theta_1]^T$ ，使得模型的预测值 $hθ(x)h_\theta(x)$ 与真实值 $y$ 的“差距”最小化——这个“差距”需要通过成本函数来量化。

1.1.2 成本函数（定义 + 直觉）

成本函数（Cost Function）是衡量“模型预测误差”的量化指标，线性回归中最常用的是平方误差成本函数（也叫均方误差损失）。

1. 成本函数的作用

量化单个样本的预测误差： $∣hθ(x(i))−y(i)∣|h_\theta(x^{(i)}) - y^{(i)}|$ （ $i$ 表示第 $i$ 个样本）；
衡量所有样本的整体误差：对所有样本的误差做“汇总”，得到全局成本 $J(θ0,θ1)J(\theta_0, \theta_1)$ ；
指导参数优化：通过最小化 $J(θ0,θ1)J(\theta_0, \theta_1)$ ，找到最优参数 $θ\theta$ 。

2. 平方误差成本函数公式

对于包含 $m$ 个样本的数据集 ${(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ..., (x^{(m)}, y^{(m)})\}$ ，平方误差成本函数定义为：
$J(θ0,θ1)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2$

关键细节解释：

平方项 $(hθ(x(i))−y(i))2\left( h_\theta(x^{(i)}) - y^{(i)} \right)^2$ ：
相比绝对值 $∣hθ(x(i))−y(i)∣|h_\theta(x^{(i)}) - y^{(i)}|$ ，平方项能放大较大误差的权重（比如误差2的平方是4，误差1的平方是1），同时保证误差非负；
系数 $12m\frac{1}{2m}$ ：
- $1m\frac{1}{m}$ ：对所有样本的误差取平均值，避免样本数量 $m$ 影响成本大小；
- $12\frac{1}{2}$ ：后续对成本函数求导时，能抵消平方项的系数2（简化计算，不影响最优参数的求解）。

3. 成本函数的直觉理解（单变量场景）

当固定 $θ0\theta_0$ （如令 $θ0=0\theta_0=0$ ）时，成本函数 $J(θ1)J(\theta_1)$ 仅与 $θ1\theta_1$ 有关，此时 $J(θ1)J(\theta_1)$ 是关于 $θ1\theta_1$ 的二次函数，图像为“开口向上的抛物线”，抛物线的最低点就是“成本最小”的最优 $θ1\theta_1$ 。

举个具体例子：
假设数据集为 ${(1, 1), (2, 2), (3, 3)\}$ （理想线性关系 $y = x$ ），令 $θ0=0\theta_0=0$ ，则 $hθ(x)=θ1xh_\theta(x) = \theta_1 x$ ，成本函数可简化为：
$J(θ1)=16[(θ1−1)2+(2θ1−2)2+(3θ1−3)2]J(\theta_1) = \frac{1}{6} \left[ (\theta_1 - 1)^2 + (2\theta_1 - 2)^2 + (3\theta_1 - 3)^2 \right]$
展开后为 $J(θ1)=73θ12−143θ1+73J(\theta_1) = \frac{7}{3}\theta_1^2 - \frac{14}{3}\theta_1 + \frac{7}{3}$ ，其图像是抛物线，最低点在 $θ1=1\theta_1=1$ （此时 $J(θ1)=0J(\theta_1)=0$ ，预测完全准确）。

1.1.3 成本函数可视化（示例 + PyTorch代码）

仅靠公式难以直观理解成本函数的形态，通过2D图（单参数） 和3D/等高线图（双参数） 可视化，能更清晰地看到“成本随参数变化的规律”。

1. 单参数成本函数可视化（2D图）

场景：

固定 $θ0=0\theta_0=0$ ，可视化 $J(θ1)J(\theta_1)$ 随 $θ1\theta_1$ 变化的曲线（沿用上述“ $y = x$ ”的理想数据集）。

PyTorch代码实现：

import torch
import matplotlib.pyplot as plt
import matplotlib
from PIL import Image# 只使用Windows系统默认自带的字体，避免找不到字体的警告
matplotlib.rcParams["font.family"] = ["SimHei", "Microsoft YaHei"]  # 系统必装字体
matplotlib.rcParams['axes.unicode_minus'] = False  # 确保负号正常显示# 1. 构造数据集
x = torch.tensor([1.0, 2.0, 3.0])
y = torch.tensor([1.0, 2.0, 3.0])# 2. 定义成本函数
def compute_cost(theta1, x, y):m = len(x)h = theta1 * xcost = (1/(2*m)) * torch.sum(torch.square(h - y))return cost# 3. 生成theta1范围和成本值
theta1_range = torch.linspace(0.0, 2.0, 100)
costs = [compute_cost(t, x, y) for t in theta1_range]# 4. 绘制图表（关键：用θ1代替θ₁，避免特殊符号）
plt.figure(figsize=(8, 5))
plt.plot(theta1_range.numpy(), torch.tensor(costs).numpy(), color='#1f77b4', linewidth=2)# 标记最优点（用θ1代替θ₁）
optimal_theta1 = 1.0
optimal_cost = compute_cost(torch.tensor(optimal_theta1), x, y)
plt.scatter(optimal_theta1, optimal_cost, color='red', s=80,label=f'最优θ1={optimal_theta1}, 最小成本={optimal_cost:.2f}')# 标签和标题（全部用θ1代替θ₁）
plt.xlabel('θ1（特征系数）', fontsize=12)
plt.ylabel('J(θ1)（成本函数值）', fontsize=12)
plt.title('单参数（θ1）成本函数可视化', fontsize=14)
plt.legend()
plt.grid(alpha=0.3)# 保存图片
plt.savefig('linear_regression_cost_2d.png', dpi=300, bbox_inches='tight')
print("图片已保存为：linear_regression_cost_2d.png")# 显示图片
try:plt.show()
except Exception as e:print(f"直接显示失败，尝试用PIL打开图片：{e}")img = Image.open('linear_regression_cost_2d.png')img.show()

在这里插入图片描述

可视化结果解读：

曲线呈“开口向上的抛物线”，最低点在 $θ1=1.0\theta_1=1.0$ ，此时成本 $J(θ1)=0J(\theta_1)=0$ ；
当 $θ1<1.0\theta_1 < 1.0$ 或 $θ1>1.0\theta_1 > 1.0$ 时，成本均会上升，说明参数偏离最优值会导致预测误差增大。

2. 双参数成本函数可视化（3D图 + 等高线图）

当 $θ0\theta_0$ 和 $θ1\theta_1$ 均为变量时，成本函数 $J(θ0,θ1)J(\theta_0, \theta_1)$ 是“二元函数”，需用3D图展示“成本曲面”，或用等高线图展示“等成本线”。

PyTorch代码实现：

import torch
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D# 1. 构造数据集（同前）
x = torch.tensor([1.0, 2.0, 3.0])
y = torch.tensor([1.0, 2.0, 3.0])# 2. 定义双参数成本函数
def compute_cost_2d(theta0, theta1, x, y):m = len(x)h = theta0 + theta1 * x  # 完整假设函数：h_theta(x) = theta0 + theta1*xcost = (1/(2*m)) * torch.sum(torch.square(h - y))return cost# 3. 生成theta0和theta1的网格（取值范围：-1到3，共50个点）
theta0_range = torch.linspace(-1.0, 3.0, 50)
theta1_range = torch.linspace(-1.0, 3.0, 50)
theta0_grid, theta1_grid = torch.meshgrid(theta0_range, theta1_range, indexing='ij')# 4. 计算每个网格点的成本（向量化计算，避免循环）
h = theta0_grid.unsqueeze(-1) + theta1_grid.unsqueeze(-1) * x.unsqueeze(0).unsqueeze(0)
cost_grid = (1/(2*len(x))) * torch.sum(torch.square(h - y.unsqueeze(0).unsqueeze(0)), dim=-1)# 5. 绘制3D成本曲面
fig = plt.figure(figsize=(15, 6))# 子图1：3D图
ax1 = fig.add_subplot(121, projection='3d')
surf = ax1.plot_surface(theta0_grid.numpy(), theta1_grid.numpy(), cost_grid.numpy(), cmap='viridis', alpha=0.8, edgecolor='none')
# 标记最优参数（theta0=0, theta1=1，成本=0）
optimal_theta0 = 0.0
optimal_theta1 = 1.0
optimal_cost = compute_cost_2d(torch.tensor(optimal_theta0), torch.tensor(optimal_theta1), x, y)
ax1.scatter(optimal_theta0, optimal_theta1, optimal_cost, color='red', s=100, label=f'最优(θ₀,θ₁)=({optimal_theta0},{optimal_theta1})')ax1.set_xlabel('θ₀（偏置项）', fontsize=10)
ax1.set_ylabel('θ₁（特征系数）', fontsize=10)
ax1.set_zlabel('J(θ₀,θ₁)（成本）', fontsize=10)
ax1.set_title('双参数成本函数3D可视化', fontsize=12)
ax1.legend()
fig.colorbar(surf, ax=ax1, shrink=0.5, aspect=10)  # 颜色条：成本值对应颜色# 子图2：等高线图
ax2 = fig.add_subplot(122)
contour = ax2.contourf(theta0_grid.numpy(), theta1_grid.numpy(), cost_grid.numpy(), levels=20, cmap='viridis', alpha=0.8)
# 绘制等高线（黑色线条，每5条标1个值）
cbar = fig.colorbar(contour, ax=ax2)
cbar.set_label('J(θ₀,θ₁)（成本）', fontsize=10)
# 标记最优参数
ax2.scatter(optimal_theta0, optimal_theta1, color='red', s=100, label=f'最优(θ₀,θ₁)=({optimal_theta0},{optimal_theta1})')ax2.set_xlabel('θ₀（偏置项）', fontsize=10)
ax2.set_ylabel('θ₁（特征系数）', fontsize=10)
ax2.set_title('双参数成本函数等高线图', fontsize=12)
ax2.legend()
ax2.grid(alpha=0.3)plt.tight_layout()
plt.savefig('linear_regression_cost_3d_contour.png', dpi=300, bbox_inches='tight')
plt.show()

在这里插入图片描述

可视化结果解读：

3D图：成本函数是一个“开口向上的碗状曲面”，曲面的最低点就是最优参数（ $θ0=0,θ1=1\theta_0=0, \theta_1=1$ ）；
等高线图：每个闭合曲线代表“相同成本的参数组合”，越靠近中心的曲线成本越低，中心红点就是最优参数——后续的“梯度下降”就是从任意点出发，沿着等高线的“梯度方向”向中心移动，最终找到最低点。

1.1.4 逻辑回归（动机 + 决策边界）

线性回归适用于回归任务（输出连续值），但现实中更多任务是分类任务（输出离散标签，如“邮件是否为垃圾邮件”“肿瘤是否为恶性”）。逻辑回归是为二分类任务设计的基础模型，核心是将“线性输出”映射到“[0,1]概率区间”。

1. 动机：为什么线性回归不适合分类？

以二分类任务（标签 $\in \{0,1\}$ ，0表示负类，1表示正类）为例：

线性回归的预测值 $hθ(x)=θ0+θ1xh_\theta(x) = \theta_0 + \theta_1 x$ 可能超出 $[0, 1]$ 范围（比如预测“邮件为垃圾邮件的概率”为1.2或-0.3，显然不合理）；
线性回归的成本函数对分类任务是非凸的（存在多个局部最小值），梯度下降难以找到全局最优解。

因此，需要对线性回归的输出做“非线性变换”，将其压缩到 $[0, 1]$ 区间——这就是Sigmoid函数的作用。

2. Sigmoid激活函数

Sigmoid函数（也叫Logistic函数）是逻辑回归的核心，公式为：
$σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}$
其中 $\theta^T x$ （ $x$ 为输入特征向量， $θ\theta$ 为参数向量，单变量时 $\theta_0 + \theta_1 x$ ）。

Sigmoid函数的关键特性：

输出范围： $σ(z)∈(0,1)\sigma(z) \in (0,1)$ ，可解释为“样本属于正类（ $y = 1$ ）的概率”；
单调性：当 $z = 0$ 时， $σ(z)=0.5\sigma(z)=0.5$ ；当 $z > 0$ 时， $σ(z)>0.5\sigma(z)>0.5$ ；当 $z < 0$ 时， $σ(z)<0.5\sigma(z)<0.5$ ；
导数特性： $σ′(z)=σ(z)(1−σ(z))\sigma'(z) = \sigma(z)(1 - \sigma(z))$ ，后续求导简化的关键。

Sigmoid函数可视化（PyTorch代码）：

import torch
import matplotlib.pyplot as plt
import matplotlib  # 导入matplotlib用于字体配置# -------------------------- 核心：配置中文字体支持 --------------------------
# 使用Windows系统必装的中文字体，确保中文和基础符号正常显示
matplotlib.rcParams["font.family"] = ["Microsoft YaHei", "SimHei"]  # 优先微软雅黑，次选黑体
matplotlib.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题# 1. 定义Sigmoid函数（使用PyTorch内置实现，确保数值稳定性）
sigmoid = torch.nn.Sigmoid()
z = torch.linspace(-10.0, 10.0, 1000)  # z的取值范围：-10到10（覆盖Sigmoid的主要变化区域）
sigma_z = sigmoid(z)# 2. 绘制Sigmoid曲线
plt.figure(figsize=(8, 5))
plt.plot(z.numpy(), sigma_z.numpy(), color='#ff7f0e', linewidth=2)  # 橙色曲线，清晰醒目# 标记关键参考线（z=0和σ(z)=0.5的交点）
plt.axvline(x=0, color='gray', linestyle='--', alpha=0.5, label='z=0')
plt.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5, label='σ(z)=0.5')# 标记特殊点（z=2和z=-2时的输出值）
plt.scatter(2, sigmoid(torch.tensor(2.0)), color='red', s=60, label='z=2 → σ(z)≈0.88')
plt.scatter(-2, sigmoid(torch.tensor(-2.0)), color='blue', s=60, label='z=-2 → σ(z)≈0.12')# 标签和标题（中文正常显示，无警告）
plt.xlabel('z = θ^T x', fontsize=12)  # 保持数学符号，θ和T在中文字体中支持
plt.ylabel('σ(z)（预测为正类的概率）', fontsize=12)
plt.title('Sigmoid激活函数可视化', fontsize=14)plt.legend()  # 显示图例
plt.grid(alpha=0.3)  # 浅色网格，辅助观察# 保存图片（高清，避免标签截断）
plt.savefig('sigmoid_function.png', dpi=300, bbox_inches='tight')
print("图片已保存为：sigmoid_function.png")# 显示图片（兼容不同环境）
try:plt.show()
except Exception as e:print(f"直接显示失败，尝试用系统打开图片：{e}")from PIL import Image  # 延迟导入，避免未用到时的依赖问题img = Image.open('sigmoid_function.png')img.show()

在这里插入图片描述

3. 逻辑回归的假设函数

结合Sigmoid函数，逻辑回归的假设函数（预测公式）定义为：
$hθ(x)=σ(θTx)=11+e−θTxh_\theta(x) = \sigma(\theta^T x) = \frac{1}{1 + e^{-\theta^T x}}$

概率意义：

$hθ(x)=P(y=1∣x;θ)h_\theta(x) = P(y=1 | x; \theta)$ ：在给定输入 $x$ 和参数 $θ\theta$ 的情况下，样本属于正类（ $y = 1$ ）的概率；
$h_\theta(x) = P(y=0 | x; \theta)$ ：样本属于负类（ $y = 0$ ）的概率。

分类决策规则：

设定概率阈值（通常为0.5）：

若 $hθ(x)≥0.5h_\theta(x) \geq 0.5$ ，则预测 $y = 1$ （正类）；
若 $hθ(x)<0.5h_\theta(x) < 0.5$ ，则预测 $y = 0$ （负类）。
结合Sigmoid特性，该规则等价于：
若 $θTx≥0\theta^T x \geq 0$ ，预测 $y = 1$ ；
若 $θTx<0\theta^T x < 0$ ，预测 $y = 0$ 。
这里的 $θTx=0\theta^T x = 0$ 就是决策边界。

4. 决策边界（Decision Boundary）

决策边界是“将输入空间划分为正类区域和负类区域的边界”，由 $θTx=0\theta^T x = 0$ 定义，仅与参数 $θ\theta$ 有关，与样本数据无关。

（1）线性决策边界

当输入特征是“线性组合”时，决策边界为直线（2D）或平面（3D）。

示例：
假设逻辑回归参数 $θ=[θ0,θ1,θ2]T=[−3,1,1]T\theta = [\theta_0, \theta_1, \theta_2]^T = [-3, 1, 1]^T$ ，输入特征 $x = [1, x_1, x_2]^T$ （1为偏置项对应的特征），则决策边界为：
$x1+x2=3\theta^T x = -3 + x_1 + x_2 = 0 \implies x_1 + x_2 = 3$
这是一条斜率为-1、截距为3的直线，直线上方（ $x_1 + x_2 > 3$ ）预测为正类，下方预测为负类。

线性决策边界可视化（PyTorch代码）：

import torch
import matplotlib.pyplot as plt
import numpy as np# 1. 构造二分类样本数据（正类：x1+x2>3，负类：x1+x2<3）
np.random.seed(42)
# 负类样本（x1+x2 < 3）
x0 = np.random.rand(50, 2) * 4  # 随机生成[0,4)的50个2D样本
x0 = x0[np.sum(x0, axis=1) < 3]  # 筛选出x1+x2<3的样本
# 正类样本（x1+x2 > 3）
x1 = np.random.rand(50, 2) * 4
x1 = x1[np.sum(x1, axis=1) > 3]  # 筛选出x1+x2>3的样本# 2. 定义决策边界（x1 + x2 = 3）
x_boundary = np.linspace(0, 4, 100)
y_boundary = 3 - x_boundary  # 由x1 + x2 = 3推导得x2 = 3 - x1# 3. 绘制样本与决策边界
plt.figure(figsize=(8, 6))
plt.scatter(x0[:, 0], x0[:, 1], color='blue', label='负类（y=0）', alpha=0.7)
plt.scatter(x1[:, 0], x1[:, 1], color='red', label='正类（y=1）', alpha=0.7)
plt.plot(x_boundary, y_boundary, color='green', linewidth=2, label='决策边界：x1 + x2 = 3')plt.xlabel('x1（特征1）', fontsize=12)
plt.ylabel('x2（特征2）', fontsize=12)
plt.title('逻辑回归：线性决策边界示例', fontsize=14)
plt.legend()
plt.grid(alpha=0.3)
plt.xlim(0, 4)
plt.ylim(0, 4)
plt.savefig('logistic_regression_linear_boundary.png', dpi=300, bbox_inches='tight')
plt.show()

（2）非线性决策边界

当输入特征包含“多项式项”（如 $x_1^2, x_1 x_2, x_2^2$ ）时，决策边界会变成曲线（2D）或曲面（3D），可处理“非线性可分”的样本。

示例：
假设参数 $θ=[−1,0,0,1,1]T\theta = [-1, 0, 0, 1, 1]^T$ ，输入特征 $x = [1, x_1, x_2, x_1^2, x_2^2]^T$ （添加了 $x_1^2$ 和 $x_2^2$ 的多项式特征），则决策边界为：
$x12+x22=1\theta^T x = -1 + 0 \cdot x_1 + 0 \cdot x_2 + x_1^2 + x_2^2 = 0 \implies x_1^2 + x_2^2 = 1$
这是一个半径为1的圆，圆内（ $x_1^2 + x_2^2 < 1$ ）预测为负类，圆外预测为正类。

非线性决策边界可视化（PyTorch代码）：

import torch
import matplotlib.pyplot as plt
import numpy as np
import matplotlib  # 导入matplotlib用于字体配置# 使用Windows系统必装的中文字体，确保中文和基础符号正常显示
matplotlib.rcParams["font.family"] = ["Microsoft YaHei", "SimHei"]  # 优先微软雅黑，次选黑体# 1. 构造二分类样本数据（正类：x1+x2>3，负类：x1+x2<3）
np.random.seed(42)
# 负类样本（x1+x2 < 3）
x0 = np.random.rand(50, 2) * 4  # 随机生成[0,4)的50个2D样本
x0 = x0[np.sum(x0, axis=1) < 3]  # 筛选出x1+x2<3的样本
# 正类样本（x1+x2 > 3）
x1 = np.random.rand(50, 2) * 4
x1 = x1[np.sum(x1, axis=1) > 3]  # 筛选出x1+x2>3的样本# 2. 定义决策边界（x1 + x2 = 3）
x_boundary = np.linspace(0, 4, 100)
y_boundary = 3 - x_boundary  # 由x1 + x2 = 3推导得x2 = 3 - x1# 3. 绘制样本与决策边界
plt.figure(figsize=(8, 6))
plt.scatter(x0[:, 0], x0[:, 1], color='blue', label='负类（y=0）', alpha=0.7)
plt.scatter(x1[:, 0], x1[:, 1], color='red', label='正类（y=1）', alpha=0.7)
plt.plot(x_boundary, y_boundary, color='green', linewidth=2, label='决策边界：x1 + x2 = 3')plt.xlabel('x1（特征1）', fontsize=12)
plt.ylabel('x2（特征2）', fontsize=12)
plt.title('逻辑回归：线性决策边界示例', fontsize=14)
plt.legend()
plt.grid(alpha=0.3)
plt.xlim(0, 4)
plt.ylim(0, 4)
plt.savefig('logistic_regression_linear_boundary.png', dpi=300, bbox_inches='tight')
plt.show()

在这里插入图片描述

1.1.5 逻辑回归成本函数（基础版 + 简化版）

逻辑回归的核心目标是“找到最优参数 $θ\theta$ ，使模型对正类样本预测高概率、负类样本预测低概率”，因此需要设计专门的成本函数——对数似然成本函数（也叫交叉熵损失）。

1. 为什么不用线性回归的平方误差成本函数？

若直接将逻辑回归的假设函数 $hθ(x)h_\theta(x)$ 代入线性回归的平方误差成本函数：
$J(θ)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2$
由于 $hθ(x)=σ(θTx)h_\theta(x) = \sigma(\theta^T x)$ 是非线性函数，代入后 $J(θ)J(\theta)$ 会变成非凸函数（图像存在多个局部最小值），导致梯度下降算法可能陷入局部最优，无法找到全局最优参数。

因此，必须为逻辑回归设计凸成本函数——对数似然成本函数满足这一要求。

2. 基础版成本函数（分情况定义）

逻辑回归的成本函数基于“似然思想”：让“模型预测当前所有样本标签的概率”最大化。对似然函数取负对数（将最大化问题转化为最小化问题），得到基础版成本函数：

对于单个样本 $x^{(i)}, y^{(i)})$ ，其成本 $Cost(hθ(x(i)),y(i))Cost(h_\theta(x^{(i)}), y^{(i)})$ 定义为：
$Cost(h_\theta(x^{(i)}), y^{(i)}) = \begin{cases} -\log(h_\theta(x^{(i)})) & \text{若 } y^{(i)} = 1 \\ -\log(1 - h_\theta(x^{(i)})) & \text{若 } y^{(i)} = 0 \end{cases}$

成本函数的直觉理解：

当 $y^{(i)} = 1$ （正类样本）：
若模型预测 $hθ(x(i))≈1h_\theta(x^{(i)}) \approx 1$ （预测准确），则 $−log⁡(1)=0-\log(1) = 0$ （成本最小）；
若 $hθ(x(i))≈0h_\theta(x^{(i)}) \approx 0$ （预测错误），则 $−log⁡(0+)→+∞-\log(0^+) \to +\infty$ （成本趋近于无穷大，惩罚强烈）。
当 $y^{(i)} = 0$ （负类样本）：
若模型预测 $hθ(x(i))≈0h_\theta(x^{(i)}) \approx 0$ （预测准确），则 $−log⁡(1−0)=0-\log(1 - 0) = 0$ （成本最小）；
若 $hθ(x(i))≈1h_\theta(x^{(i)}) \approx 1$ （预测错误），则 $−log⁡(1−1−)→+∞-\log(1 - 1^-) \to +\infty$ （成本趋近于无穷大，惩罚强烈）。

示例：
假设正类样本 $y = 1$ ，若预测概率 $h = 0.9$ ，则成本 $=−log⁡(0.9)≈0.105=-\log(0.9) \approx 0.105$ （成本小）；若 $h = 0.1$ ，则成本 $=−log⁡(0.1)≈2.303=-\log(0.1) \approx 2.303$ （成本大）。

3. 简化版成本函数（合并为单公式）

基础版成本函数分两种情况，不便计算和求导，可通过“标签 $y$ 的取值（0或1）”将其合并为一个公式：

对于所有 $m$ 个样本，全局成本函数 $J(θ)J(\theta)$ 定义为：
$J(θ)=−1m∑i=1m[y(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i)))]J(\theta) = -\frac{1}{m} \sum_{i=1}^m \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]$

合并原理：

当 $y^{(i)} = 1$ 时， $1 - y^{(i)}) = 0$ ，公式简化为 $−1m∑y(i)log⁡(hθ(x(i)))-\frac{1}{m} \sum y^{(i)} \log(h_\theta(x^{(i)}))$ （与基础版 $y = 1$ 情况一致）；
当 $y^{(i)} = 0$ 时， $y^{(i)} = 0$ ，公式简化为 $−1m∑(1−y(i))log⁡(1−hθ(x(i)))-\frac{1}{m} \sum (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)}))$ （与基础版 $y = 0$ 情况一致）。

4. 简化版成本函数的PyTorch实现

直接使用PyTorch的nn.BCELoss()（二分类交叉熵损失），避免手动实现的数值误差（如 $log⁡(0)\log(0)$ 导致的NaN）：

import torch
import torch.nn as nn# 1. 构造样本（3个样本，2个特征）
x = torch.tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])  # 输入特征：(m, n) = (3, 2)
y = torch.tensor([[1.0], [0.0], [1.0]])  # 真实标签（二分类，需为float类型）：(m, 1)# 2. 定义逻辑回归模型（线性层 + Sigmoid）
class LogisticRegression(nn.Module):def __init__(self, input_dim):super(LogisticRegression, self).__init__()self.linear = nn.Linear(input_dim, 1)  # 线性层：输入dim=2，输出dim=1（z=θ^T x）def forward(self, x):z = self.linear(x)h = torch.sigmoid(z)  # Sigmoid映射到(0,1)return h# 3. 初始化模型和损失函数
model = LogisticRegression(input_dim=2)
criterion = nn.BCELoss()  # 二分类交叉熵损失（即简化版成本函数）# 4. 计算成本（随机初始化参数，成本值为随机值）
h = model(x)  # 模型预测概率：(3, 1)
cost = criterion(h, y)  # 计算全局成本print(f"模型预测概率 h:\n{h.detach().numpy().round(4)}")
print(f"全局成本 J(θ): {cost.item():.4f}")

输出解释：

模型预测概率 $h$ 的每个值都在 $(0, 1)$ 之间；
全局成本 $J(θ)J(\theta)$ 是所有样本成本的平均值，后续通过梯度下降最小化该值，即可更新参数 $θ\theta$ 。

小结

线性回归：适用于回归任务，假设函数为线性关系，用平方误差成本函数衡量误差，成本函数是凸函数；
成本函数可视化：单参数为抛物线，双参数为碗状曲面，帮助理解“参数优化方向”；
逻辑回归：适用于二分类任务，通过Sigmoid函数将线性输出映射到 $(0, 1)$ （概率），决策边界由 $θTx=0\theta^T x=0$ 定义（支持线性/非线性）；
逻辑回归成本函数：对数似然成本函数（凸函数），分基础版（便于理解）和简化版（便于计算），PyTorch中可用BCELoss直接调用。