当前位置: 首页 > news >正文

KTransformers安装笔记 利用docker安装KTransformers

KTransformers安装笔记

    • 0、前提条件
    • 1、安装相关依赖
    • 2、创建虚拟环境并操作
    • 3、安装pytorch==2.6版本
    • 4、下载项目源代码:
      • 针对于cpu and 1T RAM硬件条件:
    • 5、相关权重文件以及配置文件下载
    • 6、利用docker安装KTransformers

0、前提条件

CUDA12.4
建议升级到较新版本的CMake,安装git,g++,gcc

执行以下命令,把 CUDA_HOME 指向你 CUDA 的安装目录(比如 /usr/local/cuda-12.4)

export CUDA_HOME=/usr/local/cuda-12.4
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
source ~/.bashrc

查看是否成功:

which nvcc

输出:表示配置成功

/usr/local/cuda-12.4/bin/nvcc

创建本地容器

docker run -it -d --gpus all --privileged  -p 8083:8080  -p 8084:8084   -p 8055:22 --name KT_ubuntu2204_cuda124_cudnn8700 -v /home/data_c/KT_data/:/home/data_c/KT_data/   -v /home/data_a/gqr/:/home/data_a  afa4f07f5e5e /bin/bash

默认cuda与cudnn已经安装完毕,conda也安装完毕

1、安装相关依赖

sudo apt-get update 
sudo apt-get install build-essential cmake ninja-build patchelf

2、创建虚拟环境并操作

conda create --name ktransformers python=3.11
conda activate ktransformers   # 激活环境
conda install -c conda-forge libstdcxx-ng # Anaconda provides a package called `libstdcxx-ng` that includes a newer version of `libstdc++`, which can be installed via `conda-forge`.

# 查看指令
strings ~/anaconda3/envs/ktransformers/lib/libstdc++.so.6 | grep GLIBCXX

3、安装pytorch==2.6版本

在这里插入图片描述

pip install torch torchvision torchaudio

国内:

pip3 install torch torchvision torchaudio --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install packaging ninja cpufeature numpy  --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple

安装flash-attention

pip install flash_attn

在这里插入图片描述
最好执行上述语句安装,因为这中安装方式成功后,表明相关的nvcc安装成功,对接下来的安装更优把握

也可以直接下载对应cuda与torch版本的.whl包(官网说可以,没亲自尝试过)

https://github.com/Dao-AILab/flash-attention/releases

或者:

pip install flash-attn --find-links https://github.com/Dao-AILab/flash-attn/releases

判断flash-attn是否安装成功执行如下:

import flash_attn
print(flash_attn.__version__)
# 有时 flash_attn 会安装成功但 CUDA 编译失败,你可以进一步测试核心 CUDA 扩展:
from flash_attn.layers.rotary import RotaryEmbedding
from flash_attn.bert_padding import pad_input, unpad_input

正常运行,则表示成功!!!

apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libgflags-dev zlib1g-dev libfmt-dev

apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf

apt install libnuma-dev
pip install packaging ninja cpufeature numpy openai

4、下载项目源代码:

git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive

针对与不同情况的安装方式:

针对于cpu and 1T RAM硬件条件:

# For Multi-concurrency with two cpu and 1T RAM:
apt install libnuma-dev
export USE_NUMA=1
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh

经过漫长的时间等待后,编译成功,如下所示:
在这里插入图片描述

5、相关权重文件以及配置文件下载

在这里插入图片描述

下载Deepseek-R1-Q4模型权重

# 安装魔搭社区
pip install modelscope
# 模型权重文件下载
modelscope download --model lmstudio-community/DeepSeek-R1-GGUF  --include DeepSeek-R1-Q4_K_M*  --local_dir ./dir

在这里插入图片描述

下载Deepseek-R1-Q4 config配置文件

modelscope download --model deepseek-ai/DeepSeek-R1 --exclude *.safetensors  --local_dir ./config

在这里插入图片描述

报错类型:

报错一:
编译阶段报错如下:

ERROR: Failed to build installable wheels for some pyproject.toml based projects (ktransformers)

在这里插入图片描述
官网解决方案:
在这里插入图片描述
语句:

sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
###############################################################################
就在ktransformers/csrc/balance_serve/CMakeLists.txt里加上一行set(CMAKE_CUDA_STANDARD 17),应该就可以编译过去。

重新编译后:
在这里插入图片描述

报错二:
问题描述:
执行语句如下

python ktransformers/server/main.py \
  --port 10002 \
  --model_path /home/data_c/KT_data/config \
  --gguf_path /home/data_c/KT_data/dir \
  --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
  --max_new_tokens 1024 \
  --cache_lens 32768 \
  --chunk_size 256 \
  --max_batch_size 4 \
  --backend_type balance_serve \
  --force_think # useful for R1

报错显示如下:

File "/kiwi/helly/ktransformers/ktransformers/ktransformers/local_chat.py", line 110, in local_chat
optimize_and_load_gguf(model, optimize_rule_path, gguf_path, config)
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf
inject(module, optimize_config, model_config, gguf_loader)
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/optimize/optimize.py", line 31, in inject
module_cls=getattr(import(import_module_name, fromlist=[""]), import_class_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/models.py", line 22, in
from ktransformers.operators.dynamic_attention import DynamicScaledDotProductAttention
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/dynamic_attention.py", line 19, in
from ktransformers.operators.cpuinfer import CPUInfer, CPUInferKVCache
File "/kiwi/helly/ktransformers/ktransformers/ktransformers/operators/cpuinfer.py", line 25, in
import cpuinfer_ext
ImportError: /root/miniconda3/envs/ktransformers/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /kiwi/helly/ktransformers/ktransformers/cpuinfer_ext.cpython-311-x86_64-linux-gnu.so)

解决方案:
先执行如下语句,在拉起项目服务:

# 先执行如下
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6


# 在执行如下
python ktransformers/server/main.py \
  --port 10002 \
  --model_path /home/data_c/KT_data/config \
  --gguf_path /home/data_c/KT_data/dir \
  --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml \
  --max_new_tokens 1024 \
  --cache_lens 32768 \
  --chunk_size 256 \
  --max_batch_size 4 \
  --backend_type balance_serve \
  --force_think # useful for R1

其他安装指令

conda create --name ktransformers python=3.11
conda activate ktransformers
conda install -c conda-forge libstdcxx-ng
sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
sudo env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

报错三:
项目启动后,加载模型参数阶段,出现如下报错:

Use docker and upgrade to 0.2.3 version
ktransformers 0.2.3.post1+cu121torch23fancy

ktransformers --gguf_path /models/GGUF --model_path /models --cpu_infer 190 --optimize_config_path /models/DeepSeek-V3-Chat-multi-gpu.yaml --port 10002 --max_new_tokens 8192 --host 0.0.0.0 --cache_lens 32768 --total_context 32768 --force_think --no-use_cuda_graph --model_name DeepSeek-R1

Get warnings below but still can work.
No idea what's wrong.

loading blk.3.attn_q_a_norm.weight to cuda:6
loading blk.3.attn_kv_a_norm.weight to cuda:6
loading blk.3.attn_kv_b.weight to cuda:6
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
set_mempolicy: Operation not permitted
set_mempolicy: Operation not permitted

报错原因:这是使用docker部署时,容器没有相应的权限,需要在创建容器是加上参数

--privileged

即:

docker run -it -d --gpus all --privileged  -p 8083:8080  -p 8084:8084   -p 8055:22 --name KT_ubuntu2204_cuda124_cudnn8700 -v /home/data_c/KT_data/:/home/data_c/KT_data/   -v /home/data_a/gqr/:/home/data_a  afa4f07f5e5e /bin/bash

其他安装指令:

conda create --name ktransformers python=3.11
conda activate ktransformers
conda install -c conda-forge libstdcxx-ng
sudo apt install libtbb-dev libssl-dev libcurl4-openssl-dev libaio1 libaio-dev libfmt-dev libgflags-dev zlib1g-dev patchelf
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
sudo env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

在这里插入图片描述



apt install libnuma-dev
export USE_NUMA=1
env USE_BALANCE_SERVE=1 PYTHONPATH="$(which python)" PATH="$(dirname $(which python)):$PATH" bash ./install.sh

6、利用docker安装KTransformers

本文安装的版本是0.2.1版本
在这里插入图片描述

拉取官方镜像指令:

docker pull approachingai/ktransformers:0.2.1

创建对应容器指令:

docker run  --gpus all -p 8077:8082  --privileged    -v /home/data_c/KT_data/:/models --name ktransformers -itd approachingai/ktransformers:0.2.1

进入容器内部:

docker exec -it ktransformers /bin/bash

启动模型对话:

python -m ktransformers.local_chat  --gguf_path /models/DeepSeek-V3 --model_path /models/deepseek-ai/DeepSeek-V3 --cpu_infer 33

如果运行上述语句报错,提示如下:

Illegal instruction (core dumped)

解决的方案是:
在该容器环境内,重新编译KTransformers,执行

bash install.sh

启动后,出现如下:
在这里插入图片描述

官方也提供OpenAI接口的形式:

运行如下指令:

python ktransformers/server/main.py   --port 8082   --model_path /models/deepseek-ai/DeepSeek-V3   --gguf_path  /models/DeepSeek-V3   --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml   --max_new_tokens 1024   --cache_lens 32768      --max_batch_size 4      --force_think True  --model_name DeepSeek-R1

在这里插入图片描述

访问方式:

curl -X POST http://localhost:8082/v1/chat/completions \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "hello"}
    ],
    "model": "DeepSeek-R1",
    "temperature": 0.3,
    "top_p": 1.0,
    "stream": true
  }'

参数:

usage: kvcache.ai [-h] [--host HOST] [--port PORT] [--ssl_keyfile SSL_KEYFILE] [--ssl_certfile SSL_CERTFILE] [--web WEB] [--model_name MODEL_NAME] [--model_dir MODEL_DIR] [--model_path MODEL_PATH] [--device DEVICE]
                  [--gguf_path GGUF_PATH] [--optimize_config_path OPTIMIZE_CONFIG_PATH] [--cpu_infer CPU_INFER] [--type TYPE] [--paged PAGED] [--total_context TOTAL_CONTEXT] [--max_batch_size MAX_BATCH_SIZE]
                  [--max_chunk_size MAX_CHUNK_SIZE] [--max_new_tokens MAX_NEW_TOKENS] [--json_mode JSON_MODE] [--healing HEALING] [--ban_strings BAN_STRINGS] [--gpu_split GPU_SPLIT] [--length LENGTH] [--rope_scale ROPE_SCALE]
                  [--rope_alpha ROPE_ALPHA] [--no_flash_attn NO_FLASH_ATTN] [--low_mem LOW_MEM] [--experts_per_token EXPERTS_PER_TOKEN] [--load_q4 LOAD_Q4] [--fast_safetensors FAST_SAFETENSORS] [--draft_model_dir DRAFT_MODEL_DIR]
                  [--no_draft_scale NO_DRAFT_SCALE] [--modes MODES] [--mode MODE] [--username USERNAME] [--botname BOTNAME] [--system_prompt SYSTEM_PROMPT] [--temperature TEMPERATURE] [--smoothing_factor SMOOTHING_FACTOR]
                  [--dynamic_temperature DYNAMIC_TEMPERATURE] [--top_k TOP_K] [--top_p TOP_P] [--top_a TOP_A] [--skew SKEW] [--typical TYPICAL] [--repetition_penalty REPETITION_PENALTY] [--frequency_penalty FREQUENCY_PENALTY]
                  [--presence_penalty PRESENCE_PENALTY] [--max_response_tokens MAX_RESPONSE_TOKENS] [--response_chunk RESPONSE_CHUNK] [--no_code_formatting NO_CODE_FORMATTING] [--cache_8bit CACHE_8BIT] [--cache_q4 CACHE_Q4]
                  [--ngram_decoding NGRAM_DECODING] [--print_timings PRINT_TIMINGS] [--amnesia AMNESIA] [--batch_size BATCH_SIZE] [--cache_lens CACHE_LENS] [--log_dir LOG_DIR] [--log_file LOG_FILE] [--log_level LOG_LEVEL]
                  [--backup_count BACKUP_COUNT] [--db_type DB_TYPE] [--db_host DB_HOST] [--db_port DB_PORT] [--db_name DB_NAME] [--db_pool_size DB_POOL_SIZE] [--db_database DB_DATABASE] [--user_secret_key USER_SECRET_KEY]
                  [--user_algorithm USER_ALGORITHM] [--force_think FORCE_THINK] [--web_cross_domain WEB_CROSS_DOMAIN] [--file_upload_dir FILE_UPLOAD_DIR] [--assistant_store_dir ASSISTANT_STORE_DIR] [--prompt_file PROMPT_FILE]


报错:

在docker容器内运行0.2.1版本的API时,会出现以下报错,解决方案是拉取0.2.0版本的docker容器,构建项目

During handling of the above exception, another exception occurred:

Exception Group Traceback (most recent call last):
| File "/usr/python310/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/usr/python310/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
| return await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/applications.py", line 112, in call
| await self.middleware_stack(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in call
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in call
| await self.app(scope, receive, _send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call
| await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 715, in call
| await self.middleware_stack(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
| await route.handle(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
| await self.app(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| raise exc
| File "/usr/python310/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/python310/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
| await response(scope, receive, send)
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 261, in call
| async with anyio.create_task_group() as task_group:
| File "/usr/python310/lib/python3.10/site-packages/anyio/_backends/asyncio.py", line 767, in aexit
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 264, in wrap
| await func()
| File "/usr/python310/lib/python3.10/site-packages/starlette/responses.py", line 245, in stream_response
| async for chunk in self.body_iterator:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 80, in check_client_link
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 93, in to_stream_reply
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 87, in add_done
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/schemas/assistants/streaming.py", line 107, in filter_chat_chunk
| async for event in async_events:
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/api/openai/endpoints/chat.py", line 34, in inner
| async for token in interface.inference(input_message,id):
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/ktransformers.py", line 181, in inference
| async for v in super().inference(local_messages, thread_id):
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/transformers.py", line 340, in inference
| for t in self.prefill(input_ids, self.check_is_new(thread_id)):
| File "/usr/python310/lib/python3.10/site-packages/torch/utils/contextlib.py", line 36, in generator_context
| response = gen.send(None)
| File "/mnt/data/ktransformers-0.2.1/ktransformers/server/backend/interfaces/ktransformers.py", line 130, in prefill
| self.cache.reset()
| File "/mnt/data/ktransformers-0.2.1/ktransformers/models/custom_cache.py", line 175, in reset
| self.value_cache[layer_idx].zero()
| AttributeError: 'NoneType' object has no attribute 'zero'

在这里插入图片描述

在这里插入图片描述

修改后:

    def reset(self):
        """Resets the cache values while preserving the objects"""
        for layer_idx in range(len(self.key_cache)):
            # In-place ops prevent breaking the static address
            if self.key_cache[layer_idx] is not None:
                self.key_cache[layer_idx].zero_()
            if self.value_cache[layer_idx] is not None:
                self.value_cache[layer_idx].zero_()

或者:
Should be fixed by latest release0.2.1.post1

在这里插入图片描述

相关文章:

  • 系统分析师(六)-- 计算机网络
  • 留守儿童|基于SprinBoot+vue的留守儿童爱心网站(源码+数据库+文档)
  • 我又叕叕叕更新了~纯手工编写C++画图,有注释~
  • 【实证分析】数智化转型对制造企业全要素生产率的影响及机制探究(1999-2023年)
  • spring security oauth2.0 使用GitHub
  • KiActivateWaiterQueue函数和Queue->Header.WaitListHead队列等待列表的关系
  • 【第三章】13-常用模块1-ngx_http_upstream_module
  • Introduction To Raymarching
  • AI结合VBA提升EXCEL办公效率尝试
  • SQL:Relationship(关系)
  • 类似东郊到家的上门按摩预约服务系统小程序APP源码全开源
  • 3.5 字典补充
  • Google 官方提示工程 (Prompt Engineering)白皮书 总结
  • ESP32与STM32哪种更适合初学者?
  • Qt触摸屏隐藏鼠标指针
  • Python数组(array)学习之旅:数据结构的奇妙冒险
  • DRM(Digital Rights Management)生态以及架构介绍
  • 自动驾驶技术-相机_IMU时空标定
  • NI的LABVIEW工具安装及卸载步骤说明
  • 博途 TIA Portal之1200做主站与有意思的板子做MODBUS_RTU通讯
  • 云主机网站/腾讯企点app
  • 室内装修风格/seo网站推广方式
  • 网站上的动图axure怎么做/深圳外贸网络推广渠道
  • 石家庄做网站建设的公司排名/在百度怎么发广告做宣传
  • 贵阳开发网站建设/百度下载安装到桌面上
  • 网站二次开发公司/关键词的选取原则有