当前位置: 首页 > news >正文

加载huggingface数据集报token无效错误解决方案

加载huggingface数据集报错

import pandas as pd

df = pd.read_json("hf://datasets/udell-lab/NLP4LP/data/test.jsonl", lines=True)
print(df)

PS C:\Users\pengkangzhen\PythonProjects\llm-ecr> & C:/Users/pengkangzhen/.conda/envs/py3.12_ml/python.exe c:/Users/pengkangzhen/PythonProjects/llm-ecr/test.py
Traceback (most recent call last):
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\utils\_http.py", line 409, in hf_raise_for_status
    response.raise_for_status()
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\requests\models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/datasets/udell-lab/NLP4LP/resolve/main/data/test.jsonl

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\pengkangzhen\PythonProjects\llm-ecr\test.py", line 3, in <module>
    df = pd.read_json("hf://datasets/udell-lab/NLP4LP/data/test.jsonl", lines=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\pandas\io\json\_json.py", line 791, in read_json
    json_reader = JsonReader(
                  ^^^^^^^^^^^
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\pandas\io\json\_json.py", line 905, in __init__
    self.data = self._preprocess_data(data)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\pandas\io\json\_json.py", line 917, in _preprocess_data
    data = data.read()
           ^^^^^^^^^^^
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\hf_file_system.py", line 1012, in read
    return f.read()
           ^^^^^^^^
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\hf_file_system.py", line 1076, in read
    hf_raise_for_status(self.response)
  File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\utils\_http.py", line 426, in hf_raise_for_status
    raise _format(GatedRepoError, message, response) from e
huggingface_hub.errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-67e277e4-0075ac4f326acd5618277c45;a1701fd4-4a35-4028-a188-c4c78e9b3b7b)

Cannot access gated repo for url https://huggingface.co/datasets/udell-lab/NLP4LP/resolve/main/data/test.jsonl.
Access to dataset udell-lab/NLP4LP is restricted. You must have access to it and be authenticated to access it. Please log in.

需要执行登录流程:

huggingface-cli login

在这里插入图片描述

import os
from huggingface_hub import login
import pandas as pd

os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"  # 使用国内镜像源

# 使用精简后的令牌登录
login(token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

# 尝试读取数据集
try:
    df = pd.read_json("hf://datasets/udell-lab/NLP4LP/data/test.jsonl", lines=True)
    print("权限配置正确,数据集读取成功!")
    print(df)
except Exception as e:
    print(f"权限配置错误,读取失败: {str(e)}")

输出:

权限配置正确,数据集读取成功!
                                           description  ...                                       optimus_code
0    Mrs. Watson wants to invest in the real-estate...  ...  # Code automatically generated from OptiMUS\n\...
1    A breakfast joint makes two different sandwich...  ...  # Code automatically generated from OptiMUS\n\...
2    A cleaning company located in Edmonton wants t...  ...  # Code automatically generated from OptiMUS\n\...
3    There is 1000 mg of gold available that is nee...  ...  # Code automatically generated from OptiMUS\n\...
4    A store employs senior citizens who earn $500 ...  ...  # Code automatically generated from OptiMUS\n\...
..                                                 ...  ...                                                ...
264  Both chemical A and chemical B need to be adde...  ...  # Code automatically generated from OptiMUS\n\...
265  A senior home has snacks of spinach and soybea...  ...  # Code automatically generated from OptiMUS\n\...
266  A keyboard manufacturer makes mechanical and s...  ...  # Code automatically generated from OptiMUS\n\...
267  A tourism company can buy sedans or buses to a...  ...  # Code automatically generated from OptiMUS\n\...
268  A dessert shop is popular for their only two d...  ...  # Code automatically generated from OptiMUS\n\...

[269 rows x 5 columns]

相关文章:

  • PhotoShop学习02
  • 代码随想录刷题day52|(二叉树篇)106.从中序与后序遍历序列构造二叉树
  • C++中将记录集的数据复制到Excel工作表中的CRange类CopyFromRecordset函数异常怎么捕获
  • 科技赋能,高端气膜料仓重塑储存新标准—轻空间
  • 32位汇编:MASM32环境搭建与汇编窗口程序
  • 基于Babylon.js的Shader入门之六:让Shader反射环境贴图
  • 第30周Java分布式入门 线程池
  • Docker镜像迁移方案
  • 在STM32F7上实现CAN总线收发队列
  • MongoDB 与 Elasticsearch 使用场景区别及示例
  • 在 React 中,组件之间传递变量的常见方法
  • C语言贪吃蛇实现
  • 判定字符是否唯一
  • 【网络】HTTP 和 HTTPS
  • Apache Tomcat RCE漏洞(CVE-2025-24813)
  • [Windows] Edge浏览器_134.0.3124.83绿色便携增强版-集成官方Deepseek侧边栏
  • 常见框架漏洞之六:Nginx
  • 体育直播系统趣猜功能开发技术实现方案
  • 简单的shell编程
  • Java8 LocalDate LocalTime LocalDateTime的使用
  • 深圳设计网站培训班/网站快速排名服务
  • 做seo用什么网站系统/网络推广具体内容
  • 苏州市相城区疫情最新消息/seo优化外链平台
  • 番禺做网站设计/常见的网站推广方式有哪些
  • 网站的漂浮广告怎么做/企业营销型网站有哪些
  • 图门市建设局网站/排名优化百度