当前位置: 首页 > news >正文

sklearn 加州房价数据集 fetch_california_housing 出错 403: Forbidden 修复方案

问题

加载加州房价数据时出现 403 错误 HTTP Error 403: Forbidden

from sklearn.datasets import fetch_california_housingcalifornia = fetch_california_housing()
print(california.target.shape) 

解决方案

运行下述代码,然后再运行上述的 fetch_california_housing() 可成功运行

import requests
import os
import tarfile
import numpy as np
from types import SimpleNamespacefrom sklearn import datasets
# 参考: 
# https://blog.csdn.net/getalong/article/details/141201658
# https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.htmlfetch_california_housing_manual_desc = '''
.. _california_housing_dataset:California Housing dataset
--------------------------**Data Set Characteristics:**:Number of Instances: 20640:Number of Attributes: 8 numeric, predictive attributes and the target:Attribute Information:- MedInc        median income in block group- HouseAge      median house age in block group- AveRooms      average number of rooms per household- AveBedrms     average number of bedrooms per household- Population    block group population- AveOccup      average number of household members- Latitude      block group latitude- Longitude     block group longitude:Missing Attribute Values: NoneThis dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.htmlThe target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).This dataset was derived from the 1990 U.S. census, using one row per census
block group. A block group is the smallest geographical unit for which the U.S.
Census Bureau publishes sample data (a block group typically has a population
of 600 to 3,000 people).A household is a group of people residing within a home. Since the average
number of rooms and bedrooms in this dataset are provided per household, these
columns may take surprisingly large values for block groups with few households
and many empty houses, such as vacation resorts.It can be downloaded/loaded using the
:func:`sklearn.datasets.fetch_california_housing` function... rubric:: References- Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,Statistics and Probability Letters, 33 (1997) 291-297
'''def download_file(url, directory, filename):# 确保目录存在os.makedirs(directory, exist_ok=True)# 完整文件路径filepath = os.path.join(directory, filename)# 下载文件response = requests.get(url, stream=True)response.raise_for_status()  # 检查请求是否成功# 将内容写入文件with open(filepath, 'wb') as file:for chunk in response.iter_content(chunk_size=8192):file.write(chunk)print(f"文件已下载到: {filepath}")def fetch_california_housing_manual():data_home = datasets.get_data_home()archive_path = os.path.join(data_home, 'cal_housing.tgz')if not os.path.exists(archive_path):download_file("https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz", data_home, 'cal_housing.tgz')with tarfile.open(mode="r:gz", name=archive_path) as f:cal_housing = np.loadtxt(f.extractfile("CaliforniaHousing/cal_housing.data"), delimiter=",")# Columns are not in the same order compared to the previous# URL resource on lib.stat.cmu.educolumns_index = [8, 7, 2, 3, 4, 5, 6, 1, 0]cal_housing = cal_housing[:, columns_index]feature_names = ["MedInc","HouseAge","AveRooms","AveBedrms","Population","AveOccup","Latitude","Longitude",]target_names = ['MedHouseVal']target, data = cal_housing[:, 0], cal_housing[:, 1:]# avg rooms = total rooms / householdsdata[:, 2] /= data[:, 5]# avg bed rooms = total bed rooms / householdsdata[:, 3] /= data[:, 5]# avg occupancy = population / householdsdata[:, 5] = data[:, 4] / data[:, 5]# target in units of 100,000target = target / 100000.0result = {'data': data,'target': target,'feature_names': feature_names,'target_names': target_names,'DESCR': fetch_california_housing_manual_desc,}obj = SimpleNamespace(**result)return objcalifornia = fetch_california_housing_manual()
print(california.data)

文章转载自:

http://Ce5dFgtv.sktcs.cn
http://QYqiZbEq.sktcs.cn
http://xVyXQ1mz.sktcs.cn
http://GjBXuZnk.sktcs.cn
http://fLErs60Y.sktcs.cn
http://aD2TRplY.sktcs.cn
http://Ke1YWDgJ.sktcs.cn
http://IGcK1j00.sktcs.cn
http://PbLVzsmX.sktcs.cn
http://cQ3qQayk.sktcs.cn
http://AinyGa1Z.sktcs.cn
http://hkCs45wy.sktcs.cn
http://jonfESTC.sktcs.cn
http://P5Jc9cAF.sktcs.cn
http://vIiViQxO.sktcs.cn
http://9Pi9w1fG.sktcs.cn
http://AoZnIP39.sktcs.cn
http://wPfdfev4.sktcs.cn
http://p9ZDMYxt.sktcs.cn
http://Ljd1S0yg.sktcs.cn
http://xtTJKS7X.sktcs.cn
http://dly8EPKz.sktcs.cn
http://HSD5My42.sktcs.cn
http://P3N7Fdfi.sktcs.cn
http://lqzU0T4o.sktcs.cn
http://8Kce2MVo.sktcs.cn
http://GC4CDkJ6.sktcs.cn
http://WrS8xfeY.sktcs.cn
http://LA1ytFOY.sktcs.cn
http://ybVIVvjl.sktcs.cn
http://www.dtcms.com/a/375764.html

相关文章:

  • mybatis plus 如何更新参数为空, mybatis plus update方法如何更新参数为null, update()如何设置参数=null
  • Spring Boot 项目新增 Module 完整指南
  • TruckSim与Matlab-Simulink联合仿真(一)
  • virsh常用命令 笔记
  • 中国AI云市场报告:阿里云份额达35.8%,高于2至4名总和
  • 未来海洋变暖对生态环境的影响
  • 《2025年AI产业发展十大趋势报告》四十八
  • Shell 脚本判断
  • 前端工程化资源预加载
  • Linux-Shell编程正则表达式
  • CentOS7静态IP设置全攻略
  • Kafka面试精讲 Day 12:副本同步与数据一致性
  • [职业竞赛][移动应用]网络请求、JSON 文件读取解析、APP全局变量
  • 2、Python函数设计与字典应用
  • 数据分析与AI丨如何用数据分析找到更优的橡胶配方?
  • Flask 核心基础:从 路由装饰器 到 __name__ 变量 的底层逻辑解析
  • 微服务事务管理利器:Seata 核心原理与实践指南
  • ZYNQ PS 端 UART 接收数据数据帧(初学者友好版)
  • 【ARM-day03】
  • TI-92 Plus计算器:单位换算功能介绍
  • TDengine 选择函数 Max() 用户手册
  • 总结 IO、存储、硬盘、文件系统相关常识
  • MATLAB基于GM(灰色模型)与LSTM(长短期记忆网络)的组合预测方法
  • cnn,vit,mamba是如何解决医疗影像问题的
  • 数据库连接池:性能优化的秘密武器
  • 鸿蒙(HarmonyOS) 历史
  • 华为Ai岗机考20250903完整真题
  • 机器人控制器开发(文章总览)
  • 怎么选适合企业的RPA财务机器人?
  • Vite:Next-Gen Frontend Tooling 的高效之道——从原理到实践的性能革命