当前位置: 首页 > wzjs >正文

刷粉网站推广马上刷网站开发学习培训

刷粉网站推广马上刷,网站开发学习培训,如何做英文网站,腾讯云企业官网建设在本文中,我们将通过一个完整的机器学习项目,从数据清洗、特征工程到模型训练与调优,详细讲解如何使用机器学习预测房价。我们将使用Python和PyTorch框架,结合Kaggle上的房价预测数据集,逐步实现一个高效的房价预测模型…

a6a8b433-2865-4068-a5f6-3bc083500a04

在本文中,我们将通过一个完整的机器学习项目,从数据清洗、特征工程到模型训练与调优,详细讲解如何使用机器学习预测房价。我们将使用Python和PyTorch框架,结合Kaggle上的房价预测数据集,逐步实现一个高效的房价预测模型。通过本文,你将掌握机器学习项目的基本流程和关键技术。

1. 引言

房价预测是机器学习中的一个经典问题,涉及到数据清洗、特征工程、模型选择与调优等多个步骤。本文将带你从零开始,逐步实现一个房价预测模型,并通过交叉验证和模型调优来提高预测精度。

2. 数据获取与预处理

2.1 数据获取

首先,我们需要从Kaggle获取房价预测的数据集。数据集分为训练集和测试集,分别包含1460和1459条记录。

import pandas as pd
import torch
from torch import nnDATA_URL = 'http://d2l-data.s3-accelerate.amazonaws.com/'
DATA_HUB = {'kaggle_house_train': (DATA_URL + 'kaggle_house_pred_train.csv', '585e9cc93e70b39160e7921475f9bcd7d31219ce'),'kaggle_house_test': (DATA_URL + 'kaggle_house_pred_test.csv', 'fa19780a7b011d9b009e8bff8e99922a8ee2eb90')
}def download(name):url, sha1_hash = DATA_HUB[name]return pd.read_csv(url)train_data = download('kaggle_house_train')
test_data = download('kaggle_house_test')

2.2 数据清洗与特征工程

在数据清洗阶段,我们需要处理缺失值、标准化数值特征,并对类别特征进行独热编码。

all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))# 标准化数值特征
numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))
all_features[numeric_features] = all_features[numeric_features].fillna(0)# 独热编码类别特征
all_features = pd.get_dummies(all_features, dummy_na=True)
all_features = all_features.astype(float)n_train = train_data.shape[0]
train_features = torch.tensor(all_features[:n_train].values, dtype=torch.float32)
test_features = torch.tensor(all_features[n_train:].values, dtype=torch.float32)
train_labels = torch.tensor(train_data.SalePrice.values.reshape(-1, 1), dtype=torch.float32)

3. 模型构建与训练

3.1 模型定义

我们使用一个简单的线性回归模型作为基线模型。

in_features = train_features.shape[1]
net = nn.Sequential(nn.Linear(in_features, 1))

3.2 损失函数与优化器

我们使用均方误差(MSE)作为损失函数,并使用随机梯度下降(SGD)进行优化。

loss = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)

3.3 交叉验证与模型训练

为了评估模型的性能,我们使用5折交叉验证。

def k_fold(net, k, train_features, train_labels, num_epochs, lr, weight_decay, batch_size, loss):train_l_sum, valid_l_sum = 0, 0for i in range(k):# 划分训练集和验证集fold_size = len(train_features) // kstart, end = i * fold_size, (i + 1) * fold_sizevalid_features, valid_labels = train_features[start:end], train_labels[start:end]train_features_fold = torch.cat([train_features[:start], train_features[end:]])train_labels_fold = torch.cat([train_labels[:start], train_labels[end:]])# 训练模型for epoch in range(num_epochs):net.train()optimizer.zero_grad()outputs = net(train_features_fold)l = loss(outputs, train_labels_fold)l.backward()optimizer.step()# 验证模型net.eval()with torch.no_grad():valid_outputs = net(valid_features)valid_l = loss(valid_outputs, valid_labels)train_l_sum += l.item()valid_l_sum += valid_l.item()return train_l_sum / k, valid_l_sum / kk, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64
train_l, valid_l = k_fold(net, k, train_features, train_labels, num_epochs, lr, weight_decay, batch_size, loss)
print(f'{k}-折验证: 平均训练log r_mse: {float(train_l):f} 平均验证log r_mse: {float(valid_l):f}')

4. 模型调优与预测

4.1 模型调优

通过调整学习率、正则化参数等超参数,我们可以进一步提高模型的性能。

4.2 预测与提交

最后,我们使用训练好的模型对测试集进行预测,并将结果提交到Kaggle。

def train_and_predict(net, train_features, test_features, train_labels, test_data, num_epochs, lr, weight_decay, batch_size, loss):# 训练模型for epoch in range(num_epochs):net.train()optimizer.zero_grad()outputs = net(train_features)l = loss(outputs, train_labels)l.backward()optimizer.step()# 预测net.eval()with torch.no_grad():test_outputs = net(test_features)# 保存预测结果test_data['SalePrice'] = test_outputs.numpy()test_data[['Id', 'SalePrice']].to_csv('submission.csv', index=False)train_and_predict(net, train_features, test_features, train_labels, test_data, num_epochs, lr, weight_decay, batch_size, loss)

5. 完整代码

import hashlib
import os
from typing import Union, List, Callable, Tuple
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
from torch import nn
import utils as d2l"""
过拟合: 训练集效果很好,但是验证集效果很差
欠拟合: 训练集和验证集的效果都很差
对抗过拟合的技术称:正则化、 数据集大小、权重衰减、暂退法(Dropout)
欠拟合解决办法: 模型复杂性
梯度消失:sigmoid函数的输入很大或是很小时,它的梯度都会消失。 此外,当反向传播通过许多层时,除非我们在刚刚好的地方,
这些地方sigmoid函数的输入接近于零,否则整个乘积的梯度可能会消失。 当我们的网络有很多层时,除非我们很小心,否则在某一层可能会切断梯度
梯度爆炸:由于深度网络的初始化所导致时没有机会让梯度下降优化器收敛
"""DATA_HUB = dict()def download(name: str, cache_dir: os.path = os.path.abspath("../../data")) -> str:"""下载一个DATA_HUB中的文件Args:name: hub文件的名称cache_dir: 保存的本地路径Returns:本地文件路径"""assert name in DATA_HUB, f"{name} 不存在于 {DATA_HUB}"url, sha1_hash = DATA_HUB[name]os.makedirs(cache_dir, exist_ok=True)f_name = os.path.join(cache_dir, url.split('/')[-1])if os.path.exists(f_name):sha1 = hashlib.sha1()with open(f_name, 'rb') as f:while True:data = f.read(1048576)if not data:breaksha1.update(data)if sha1.hexdigest() == sha1_hash:return f_name  # 命中缓存print(f'正在从{url}下载{f_name}...')r = requests.get(url, stream=True, verify=True)with open(f_name, 'wb') as f:f.write(r.content)return f_namedef plot(X: Union[List[float], List[torch.Tensor], torch.Tensorm, np.ndarray],Y: Union[List[torch.Tensor], List[torch.Tensor], List[List[float]], None] = None,x_label=None, y_label=None, legend=None, x_lim=None,y_lim=None, x_scale='linear', y_scale='linear',fmts=('-', 'm--', 'g-.', 'r:'), axes=None):"""画图Args:X:Y:x_label:y_label:legend:x_lim:y_lim:x_scale:y_scale:fmts:axes:Returns:"""if legend is None:legend = []axes = axes if axes else d2l.plt.gca()# Return True if `X` (tensor or list) has 1 axisdef has_one_axis(t: torch.Tensor):return (hasattr(t, "ndim") and X.ndim == 1 orisinstance(t, list) and not hasattr(t[0], "__len__"))if has_one_axis(X):X = [X]if Y is None:X, Y = [[]] * len(X), Xelif has_one_axis(Y):Y = [Y]if len(X) != len(Y):X = X * len(Y)axes.cla()for x, y, fmt in zip(X, Y, fmts):if len(x):axes.plot(x, y, fmt)else:axes.plot(y, fmt)d2l.set_axes(axes, x_label, y_label, x_lim, y_lim, x_scale, y_scale, legend)plt.show()def log_mse(net: Union[torch.nn.Module, Callable], features: torch.Tensor,labels: torch.Tensor, loss: Union[torch.nn.Module, Callable]) -> float:"""计算损失,经过对数转换的均方误差(Log MSE)通常用于那些目标变量分布在多个数量级上的回归问题,通过对数转换可以减小较大值的误差影响,使模型更加关注相对误差而非绝对误差Args:net:features:labels:loss:Returns:"""# 为了在取对数时进一步稳定该值,将小于1的值设置为1clipped_predict = torch.clamp(net(features), 1, float('inf'))r_mse = torch.sqrt(loss(torch.log(clipped_predict),torch.log(labels)))return r_mse.item()def train(net: nn.Module, train_features: torch.Tensor, train_labels: torch.Tensor, test_features: torch.Tensor,test_labels: torch.Tensor, num_epochs: int, learning_rate: float, weight_decay: int, batch_size: int,loss: nn.Module) -> Tuple[List[float], List[float]]:"""训练模型Args:net: 网络模型train_features: 训练集train_labels: 训练标签test_features: 测试集test_labels:测试集标签num_epochs: 轮次learning_rate:学习率weight_decay:权重衰减batch_size:批次大小loss:损失函数Returns:训练损失列表、测试损失列表"""train_ls, test_ls = [], []train_iter = d2l.load_array((train_features, train_labels), batch_size)# 这里使用的是Adam优化算法optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate, weight_decay=weight_decay)for epoch in range(num_epochs):for X, y in train_iter:optimizer.zero_grad()l = loss(net(X), y)l.backward()optimizer.step()train_ls.append(log_mse(net, train_features, train_labels, loss))if test_labels is not None:test_ls.append(log_mse(net, test_features, test_labels, loss))return train_ls, test_lsdef get_k_fold_data(k, i, X, y) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:"""获取K折的训练和验证的数据集和标签Args:k: K折数i: 第几次索引X: 总的数据集y: 总的标签值Returns:"""assert k > 1fold_size = X.shape[0] // kX_train, y_train, X_valid, y_valid = None, None, None, Nonefor j in range(k):idx = slice(j * fold_size, (j + 1) * fold_size)X_part, y_part = X[idx, :], y[idx]if j == i:X_valid, y_valid = X_part, y_partelif X_train is None:X_train, y_train = X_part, y_partelse:X_train = torch.cat([X_train, X_part], 0)y_train = torch.cat([y_train, y_part], 0)return X_train, y_train, X_valid, y_validdef k_fold(net: torch.nn.Module, k: int, X_train: torch.Tensor, y_train: torch.Tensor, num_epochs: int,learning_rate: float, weight_decay: int, batch_size: int, loss: nn.Module):"""K折训练, k-1分为训练集, 1分为验证集,总共执行K次Args:net: 网络模型k: K折X_train: 训练集y_train: 标签num_epochs: 多少轮learning_rate: 学习率weight_decay: 权重衰减值batch_size: 批大小loss: 损失函数Returns:"""train_l_sum, valid_l_sum = 0, 0for i in range(k):X_train, y_train, X_valid, y_valid = get_k_fold_data(k, i, X_train, y_train)train_ls, valid_ls = train(net, X_train, y_train, X_valid, y_valid, num_epochs, learning_rate, weight_decay,batch_size, loss)train_l_sum += train_ls[-1]valid_l_sum += valid_ls[-1]if i == 0:plot(list(range(1, num_epochs + 1)), [train_ls, valid_ls], x_label='epoch', y_label='r_mse',x_lim=[1, num_epochs],legend=['train', 'valid'], y_scale='log')print(f'折{i + 1},训练log r_mse{float(train_ls[-1]):f},验证log r_mse{float(valid_ls[-1]):f}')return train_l_sum / k, valid_l_sum / kdef train_and_predict(net, train_features, test_features, train_labels, test_data,num_epochs, lr, weight_decay, batch_size, loss):train_ls, _ = train(net, train_features, train_labels, None, None,num_epochs, lr, weight_decay, batch_size, loss)plot(np.arange(1, num_epochs + 1), [train_ls], x_label='epoch',y_label='log r_mse', x_lim=[1, num_epochs], y_scale='log')print(f'训练log r_mse:{float(train_ls[-1]):f}')# 将网络应用于测试集。predict = net(test_features).detach().numpy()# 将其重新格式化以导出到Kaggletest_data['SalePrice'] = pd.Series(predict.reshape(1, -1)[0])submission = pd.concat([test_data['Id'], test_data['SalePrice']], axis=1)submission.to_csv('submission.csv', index=False)def main():DATA_URL = 'http://d2l-data.s3-accelerate.amazonaws.com/'DATA_HUB['kaggle_house_train'] = (DATA_URL + 'kaggle_house_pred_train.csv','585e9cc93e70b39160e7921475f9bcd7d31219ce')DATA_HUB['kaggle_house_test'] = (DATA_URL + 'kaggle_house_pred_test.csv','fa19780a7b011d9b009e8bff8e99922a8ee2eb90')# [1460, 81]train_data = pd.read_csv(download('kaggle_house_train'))# [1459, 80]test_data = pd.read_csv(download('kaggle_house_test'))all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))# 处理数据# 若无法获得测试数据,则可根据训练数据计算均值和标准差numeric_features = all_features.dtypes[all_features.dtypes != 'object'].indexall_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))# 在标准化数据之后,所有均值消失,因此我们可以将缺失值设置为0all_features[numeric_features] = all_features[numeric_features].fillna(0)# “Dummy_na=True”将“na”(缺失值)视为有效的特征值,并为其创建指示符特征all_features = pd.get_dummies(all_features, dummy_na=True)all_features = all_features.astype(float)n_train = train_data.shape[0]# [1460, 330]train_features = torch.tensor(all_features[:n_train].values, dtype=torch.float32)# [1459, 330]test_features = torch.tensor(all_features[n_train:].values, dtype=torch.float32)# [1460, 1]train_labels = torch.tensor(train_data.SalePrice.values.reshape(-1, 1), dtype=torch.float32)# 训练loss = nn.MSELoss()in_features = train_features.shape[1]# [330, 1]net = nn.Sequential(nn.Linear(in_features, 1))k, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64train_l, valid_l = k_fold(net, k, train_features, train_labels, num_epochs, lr, weight_decay, batch_size, loss)print(f'{k}-折验证: 平均训练log r_mse: {float(train_l):f} 平均验证log r_mse: {float(valid_l):f}')train_and_predict(net, train_features, test_features, train_labels, test_data,num_epochs, lr, weight_decay, batch_size, loss)if __name__ == '__main__':main()

项目代码

6. 参考文献

  • Kaggle房价预测竞赛
  • PyTorch官方文档
http://www.dtcms.com/wzjs/794260.html

相关文章:

  • 买了域名后做网站该怎么弄外贸流程培训
  • 安徽省工程建设项目信息网seo建站技巧
  • wordpress 滑动 评论seoul是哪个国家
  • 南昌招商网站建设常见网站开发的语言
  • 有口碑的宜昌网站建设电商在线官方
  • 讯响模板网站一个网站如何推广
  • 做网站 公司有哪些太原网站建设注意
  • 东营网站建设报价竞价托管推广代运营
  • 网络推广网站排行榜山东住房与城乡建设部网站
  • 湖里区建设局网站百度推广免费
  • 海珠区建网站怎么做网站 高中信息技术
  • 建造网站的软件网站建设指引
  • 网站建站中关键字搜索怎么弄wordpress 微信编辑器插件下载
  • 画网站 模板宣传平台的软件有哪些
  • 影视网站怎么做原创先建网站还是先做app好
  • 汽车租赁网站设计学校门户网站流程建设方案
  • 建网站怎么避免备案aso优化
  • php 判断 $_get 然后跳转到相印的网站wordpress 粉丝
  • 台州网站建设惠店科技网上购物哪个商城好
  • 建设网站公司选哪家好网站开发技术可行性
  • ftp网站建立建设项目公告网站
  • 手机怎么自己设计图片排名网站优化培训
  • wordpress拍卖插件中文seo站内优化公司
  • asp网站路径射阳做网站
  • 购物网站建设案例微信小程序入口登录
  • 沈阳双兴建设集团有限公司网站泉州建设网站开发
  • 南山商城网站建设多少钱网页欣赏网站
  • 网站建设是属于虚拟产品吗天蝎做网站建网站
  • 188旅游网站源码下载影视软件开发定制
  • 电子商务网站建设方案案例网站制作公司怎样帮客户做优化