加载huggingface数据集报token无效错误解决方案
加载huggingface数据集报错
import pandas as pd
df = pd.read_json("hf://datasets/udell-lab/NLP4LP/data/test.jsonl", lines=True)
print(df)
PS C:\Users\pengkangzhen\PythonProjects\llm-ecr> & C:/Users/pengkangzhen/.conda/envs/py3.12_ml/python.exe c:/Users/pengkangzhen/PythonProjects/llm-ecr/test.py
Traceback (most recent call last):
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\utils\_http.py", line 409, in hf_raise_for_status
response.raise_for_status()
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\requests\models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/datasets/udell-lab/NLP4LP/resolve/main/data/test.jsonl
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\pengkangzhen\PythonProjects\llm-ecr\test.py", line 3, in <module>
df = pd.read_json("hf://datasets/udell-lab/NLP4LP/data/test.jsonl", lines=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\pandas\io\json\_json.py", line 791, in read_json
json_reader = JsonReader(
^^^^^^^^^^^
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\pandas\io\json\_json.py", line 905, in __init__
self.data = self._preprocess_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\pandas\io\json\_json.py", line 917, in _preprocess_data
data = data.read()
^^^^^^^^^^^
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\hf_file_system.py", line 1012, in read
return f.read()
^^^^^^^^
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\hf_file_system.py", line 1076, in read
hf_raise_for_status(self.response)
File "C:\Users\pengkangzhen\.conda\envs\py3.12_ml\Lib\site-packages\huggingface_hub\utils\_http.py", line 426, in hf_raise_for_status
raise _format(GatedRepoError, message, response) from e
huggingface_hub.errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-67e277e4-0075ac4f326acd5618277c45;a1701fd4-4a35-4028-a188-c4c78e9b3b7b)
Cannot access gated repo for url https://huggingface.co/datasets/udell-lab/NLP4LP/resolve/main/data/test.jsonl.
Access to dataset udell-lab/NLP4LP is restricted. You must have access to it and be authenticated to access it. Please log in.
需要执行登录流程:
huggingface-cli login
import os
from huggingface_hub import login
import pandas as pd
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" # 使用国内镜像源
# 使用精简后的令牌登录
login(token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
# 尝试读取数据集
try:
df = pd.read_json("hf://datasets/udell-lab/NLP4LP/data/test.jsonl", lines=True)
print("权限配置正确,数据集读取成功!")
print(df)
except Exception as e:
print(f"权限配置错误,读取失败: {str(e)}")
输出:
权限配置正确,数据集读取成功!
description ... optimus_code
0 Mrs. Watson wants to invest in the real-estate... ... # Code automatically generated from OptiMUS\n\...
1 A breakfast joint makes two different sandwich... ... # Code automatically generated from OptiMUS\n\...
2 A cleaning company located in Edmonton wants t... ... # Code automatically generated from OptiMUS\n\...
3 There is 1000 mg of gold available that is nee... ... # Code automatically generated from OptiMUS\n\...
4 A store employs senior citizens who earn $500 ... ... # Code automatically generated from OptiMUS\n\...
.. ... ... ...
264 Both chemical A and chemical B need to be adde... ... # Code automatically generated from OptiMUS\n\...
265 A senior home has snacks of spinach and soybea... ... # Code automatically generated from OptiMUS\n\...
266 A keyboard manufacturer makes mechanical and s... ... # Code automatically generated from OptiMUS\n\...
267 A tourism company can buy sedans or buses to a... ... # Code automatically generated from OptiMUS\n\...
268 A dessert shop is popular for their only two d... ... # Code automatically generated from OptiMUS\n\...
[269 rows x 5 columns]