当前位置：首页 > news >正文

【bug】大模型微调bug：OSError: Failed to load tokenizer.| Lora

news 2025/10/17 6:34:07

文章目录

- 报错原文
- bug分析
- - 解决方案

LLaMA-Factory 默认使用的是 Hugging Face Hub（huggingface.co）作为模型下载源。如果你在国内、或者使用的是 ModelScope 镜像（魔搭社区），就必须手动改配置或环境变量，让它从 modelscope.cn 拉取模型。

报错原文

root@dsw-1400022-546bdb8d54-27nsv:/mnt/workspace/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
2025-10-16 13:00:45.897740: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-10-16 13:00:45.937885: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-10-16 13:00:46.853637: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2025-10-16 13:00:49,181] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /root/.triton/autotune: 没有那个文件或目录
[2025-10-16 13:00:51,610] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
[INFO|2025-10-16 13:00:52] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 1, device: cuda:0, distributed training: False, compute dtype: torch.bfloat16
[INFO|tokenization_auto.py:898] 2025-10-16 13:01:02,405 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
Traceback (most recent call last):File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_connsock = connection.create_connection(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connectionraise errFile "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connectionsock.connect(sa)
OSError: [Errno 101] Network is unreachableThe above exception was the direct cause of the following exception:Traceback (most recent call last):File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopenresponse = self._make_request(^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 488, in _make_requestraise new_eFile "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 464, in _make_requestself._validate_conn(conn)File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1093, in _validate_connconn.connect()File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 704, in connectself.sock = sock = self._new_conn()^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_connraise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f63fc1bfcd0>: Failed to establish a new connection: [Errno 101] Network is unreachableThe above exception was the direct cause of the following exception:Traceback (most recent call last):File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 667, in sendresp = conn.urlopen(^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopenretries = retries.increment(^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/urllib3/util/retry.py", line 519, in incrementraise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f63fc1bfcd0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))During handling of the above exception, another exception occurred:Traceback (most recent call last):File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1546, in _get_metadata_or_catch_errormetadata = get_hf_file_metadata(^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fnreturn fn(*args, **kwargs)^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1463, in get_hf_file_metadatar = _request_wrapper(^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 286, in _request_wrapperresponse = _request_wrapper(^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 309, in _request_wrapperresponse = http_backoff(method=method, url=url, **params, retry_on_exceptions=(), retry_on_status_codes=(429,))^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 310, in http_backoffresponse = session.request(method=method, url=url, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in requestresp = self.send(prep, **send_kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in sendr = adapter.send(request, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 96, in sendreturn super().send(request, *args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 700, in sendraise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f63fc1bfcd0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 173af757-1a47-42b9-9662-480a9cec02c5)')The above exception was the direct cause of the following exception:Traceback (most recent call last):File "/usr/local/lib/python3.11/site-packages/transformers/utils/hub.py", line 479, in cached_fileshf_hub_download(File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fnreturn fn(*args, **kwargs)^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1010, in hf_hub_downloadreturn _hf_hub_download_to_cache_dir(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1117, in _hf_hub_download_to_cache_dir_raise_on_head_call_error(head_call_error, force_download, local_files_only)File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1661, in _raise_on_head_call_errorraise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.The above exception was the direct cause of the following exception:Traceback (most recent call last):File "/mnt/workspace/LLaMA-Factory/src/llamafactory/model/loader.py", line 78, in load_tokenizertokenizer = AutoTokenizer.from_pretrained(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 1069, in from_pretrainedconfig = AutoConfig.from_pretrained(^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1250, in from_pretrainedconfig_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/configuration_utils.py", line 649, in get_config_dictconfig_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/configuration_utils.py", line 708, in _get_config_dictresolved_config_file = cached_file(^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/utils/hub.py", line 321, in cached_filefile = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/utils/hub.py", line 553, in cached_filesraise OSError(
OSError: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Check your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.The above exception was the direct cause of the following exception:Traceback (most recent call last):File "/usr/local/bin/llamafactory-cli", line 8, in <module>sys.exit(main())^^^^^^File "/mnt/workspace/LLaMA-Factory/src/llamafactory/cli.py", line 24, in mainlauncher.launch()File "/mnt/workspace/LLaMA-Factory/src/llamafactory/launcher.py", line 152, in launchrun_exp()File "/mnt/workspace/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp_training_function(config={"args": args, "callbacks": callbacks})File "/mnt/workspace/LLaMA-Factory/src/llamafactory/train/tuner.py", line 72, in _training_functionrun_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)File "/mnt/workspace/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 48, in run_sfttokenizer_module = load_tokenizer(model_args)^^^^^^^^^^^^^^^^^^^^^^^^^^File "/mnt/workspace/LLaMA-Factory/src/llamafactory/model/loader.py", line 93, in load_tokenizerraise OSError("Failed to load tokenizer.") from e
OSError: Failed to load tokenizer.

bug分析

我的模型下载源默认是在Hubgging Face Hub。但是我是在国内，用的是ModelScope社区镜像，魔塔社区，拉取模型的时候吗，下载外网镜像会失败。

解决方案

将镜像源切换到国内的魔塔社区。

在 llama3_lora_sft.yaml 里直接指定 ModelScope 模型路径：

model_name_or_path: LLM-Research/Meta-Llama-3-8B-Instruct

设置环境变量：

export USE_MODELSCOPE_HUB=1
export MODELSCOPE_CACHE=/mnt/workspace/.cache/modelscope

开始训练：

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

训练成功如图所示：
在这里插入图片描述

查看全文

http://www.dtcms.com/a/490478.html

视频生成的背后机理：Wan2技术报告分析

有什么做衣服的网站吗天津市建筑信息平台

HTB BoardLight writeup（enlightenment 0.23.1 exploit）

唐山网站搭建平台制作计划

智能体面试题:ReAct框架是什么

泰山派rk3566 wifi基础知识

【无标题】大模型-AIGC技术在文本生成与音频生成领域的应用

渗透测试(2):不安全配置、敏感明文传输、未授权访问

有记事本做简易网站深圳网站设计x程序

AI教育开启新篇章

使用bert-base-chinese中文预训练模型，使用 lansinuote/ChnSentiCorp 中文网购评价数据集进行情感分类微调和训练。

国内做设计的网站做视频素材哪个网站好

WebGIS包括哪些技术栈？

Python全栈(基础篇)——Day13：后端内容（模块详解）

科创企业品牌营销：突破与发展之路

Spring Boot 3零基础教程，Spring Boot 指定日志文件位置，笔记21

腾讯云如何建设网站首页北京网站建设联系电话

【JWT漏洞】

2025年10月版集成RagFlow和Dify的医疗知识库自动化查询（安装篇）

苏州手机网站建设多少钱上海小程序定制公司

YOLO-V1 与 YOLO-V2 核心技术解析：目标检测的迭代突破

HarmonyOS Next 实战技巧集锦

【鸿蒙进阶-7】鸿蒙与web混合开发

HarmonyOS Next 快速参考手册

8.list的模拟实现

鸿蒙NEXT按键拦截与监听开发指南

网站建设等级定级企查查官网查企业网页版

【数据结构】基于Floyd算法的最短路径求解

【传感器技术】入门红外传感器技术

成都哪里做网站便宜郴州新网招聘官网

文章目录

报错原文

bug分析

解决方案

相关文章：