当前位置: 首页 > news >正文

使用 PaddleNLP 在 CPU(支持 AVX 指令)下跑通 llama2-7b或DeepSeek-r1:1.5b 模型(完成度80%)

原文:🚣‍♂️ 使用 PaddleNLP 在 CPU(支持 AVX 指令)下跑通 llama2-7b 模型 🚣 — PaddleNLP 文档

使用 PaddleNLP 在 CPU(支持 AVX 指令)下跑通 llama2-7b 模型 🚣

PaddleNLP 在支持 AVX 指令的 CPU 上对 llama 系列模型进行了深度适配和优化,此文档用于说明在支持 AVX 指令的 CPU 上使用 PaddleNLP 进行 llama 系列模型进行高性能推理的流程。

检查硬件:

芯片类型GCC 版本cmake 版本
Intel(R) Xeon(R) Platinum 8463B9.4.0>=3.18

注:如果要验证您的机器是否支持 AVX 指令,只需系统环境下输入命令,看是否有输出:

lscpu | grep -o -P '(?<!\w)(avx\w*)'

# 显示如下结果 -
avx
avx2
**avx512f**
avx512dq
avx512ifma
avx512cd
**avx512bw**
avx512vl
avx_vnni
**avx512_bf16**
avx512vbmi
avx512_vbmi2
avx512_vnni
avx512_bitalg
avx512_vpopcntdq
**avx512_fp16**

环境准备:

1 安装 numactl

apt-get update
apt-get install numactl

2 安装 paddle

2.1 源码安装:
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle && mkdir build && cd build

cmake .. -DPY_VERSION=3.8 -DWITH_GPU=OFF

make -j128
pip install -U python/dist/paddlepaddle-0.0.0-cp38-cp38-linux_x86_64.whl
2.2 pip 安装:
python -m pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
2.3 检查是否安装正常:
python -c "import paddle; paddle.version.show()"
python -c "import paddle; paddle.utils.run_check()"

3 克隆 PaddleNLP 仓库代码,并安装依赖

# PaddleNLP是基于paddlepaddle『飞桨』的自然语言处理和大语言模型(LLM)开发库,存放了基于『飞桨』框架实现的各种大模型,llama系列模型也包含其中。为了便于您更好地使用PaddleNLP,您需要clone整个仓库。
pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

4 安装第三方库和 paddlenlp_ops

# PaddleNLP仓库内置了专用的融合算子,以便用户享受到极致压缩的推理成本
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP/csrc/cpu
sh setup.sh

5 第三方库安装失败

#如果oneccl安装失败 建议在gcc 8.2-9.4之间重新安装
cd csrc/cpu/xFasterTransformer/3rdparty/
sh prepare_oneccl.sh

#如果xFasterTransformer 安装失败,建议在gcc 9.2以上重新安装
cd csrc/cpu/xFasterTransformer/build/
make -j24

#更多命令和环境变量可参考csrc/cpu/setup.sh

Cpu 高性能推理

PaddleNLP 还提供了基于 intel/xFasterTransformer 的 CPU 高性能推理,目前支持 FP16、BF16、INT8多种精度推理,以及 Prefill 基于 FP16,Decode 基于 INT8混合方式推理。

非 HBM 机器高性能推理参考:

1 确定 OMP_NUM_THREADS
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}')
2 动态图推理
cd ../../llm/
#2.动态图推理 高性能 AVX 动态图模型推理命令参考
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
3 静态图推理
#step1 : 静态图导出
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
#step2: 静态图推理
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path ./inference --inference_model --dtype "float32" --mode "static" --device "cpu" --avx_mode

HBM 机器高性能推理参考:

1 硬件和 OMP_NUM_THREADS 确认
#理论上HBM机器比非HBM机器nexttoken时延具有1.3倍-1.9倍的加速
#确认机器具有 hbm
lscpu
#如 node2、node3表示支持 hbm
$NUMA node0 CPU(s):                  0-31,64-95
$NUMA node1 CPU(s):                  32-63,96-127
$NUMA node2 CPU(s):
$NUMA node3 CPU(s):

#确定OMP_NUM_THREADS
lscpu | grep "Socket(s)" | awk -F ':' '{print $2}'
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}')
2 动态图推理
cd ../../llm/
# 高性能 AVX 动态图模型推理命令参考
FIRST_TOKEN_WEIGHT_LOCATION=0 NEXT_TOKEN_WEIGHT_LOCATION=2 OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
注:FIRST_TOKEN_WEIGHT_LOCATION和NEXT_TOKEN_WEIGHT_LOCATION表示first_token权重放在numa0,next_token权重放在numa2(hbm缓存节点)。
3 静态图推理
# 高性能静态图模型推理命令参考
# step1 : 静态图导出
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
# step2: 静态图推理
FIRST_TOKEN_WEIGHT_LOCATION=0 NEXT_TOKEN_WEIGHT_LOCATION=2 OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path ./inference --inference_model --dtype "float32" --mode "static" --device "cpu" --avx_mode

快速实践

安装

安装库

sudo apt update
sudo apt install numactl

看看cpu是否支持avx

lscpu | grep -o -P '(?<!\w)(avx\w*)'

安装飞桨

pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/

验证安装好飞桨

python -c "import paddle; paddle.version.show()"
python -c "import paddle; paddle.utils.run_check()"

安装PaddleNLP库

pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

下载PaddleNLP源码并

安装加速算子

git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP/csrc/cpu
sh setup.sh

编译失败

Successfully installed intel-cmplr-lib-ur-2024.2.1 intel-openmp-2024.2.1 mkl-include-2024.0.0 mkl-static-2024.0.0 tbb-2021.13.1
CMake Error at CMakeLists.txt:129 (find_package):
  By not providing "FindoneCCL.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "oneCCL", but
  CMake did not find one.

  Could not find a package configuration file provided by "oneCCL" with any
  of the following names:

    oneCCLConfig.cmake
    oneccl-config.cmake

  Add the installation prefix of "oneCCL" to CMAKE_PREFIX_PATH or set
  "oneCCL_DIR" to a directory containing one of the above files.  If "oneCCL"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
make: *** No targets specified and no makefile found.  Stop.

到oneccl子目录,重新编译下试试

(py312) skywalk@DESKTOP-9C5AU01:~/github/PaddleNLP/csrc/cpu$ cd xFasterTransformer/3rdparty/
(py312) skywalk@DESKTOP-9C5AU01:~/github/PaddleNLP/csrc/cpu/xFasterTransformer/3rdparty$ sh prepare_oneccl.sh

还是失败,看文档说gcc版本在8-9之间比较好,而当前是13.3 ,版本有点高,就先搁置吧

现在的情况是:自己本机编译失败,星河社区github连接太慢导致编译失败,kaggle编译也失败。

再次安装加速算子

先添加Ubuntu的intel cpu库

# 下载基础工具包
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
echo "deb https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update

# 安装完整开发套件(包含oneCCL)
sudo apt install intel-oneapi-ccl intel-oneapi-ccl-devel intel-oneapi-runtime-dnnl

再安装

cd PaddleNLP/csrc/cpu && oneCCL_DIR=/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL sh setup.sh

推理

到PaddleNLP/llm 这个目录,执行:

python ./predict/predictor.py --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"

总结

坑比预想的多,目前还没通。

调试

报错This system does not support NUMA policy

OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
numactl: This system does not support NUMA policy

那就不用numactl了

报错:ModuleNotFoundError: No module named 'paddlenlp_ops'

from paddlenlp_ops import (
ModuleNotFoundError: No module named 'paddlenlp_ops'

看来不编译 paddlenlp_ops不行啊!

在kaggle 编译paddlenlp_ops报错

cd xFasterTransformer/3rdparty/

!cd PaddleNLP/csrc/cpu/xFasterTransformer/3rdparty && sh prepare_oneccl.sh

再试最后一次,不行就撤。 单独编译oneccl过了,但是再编译paddlenlp还是报错

-- MKL directory already exists. Skipping installation.
CMake Error at CMakeLists.txt:129 (find_package):
  By not providing "FindoneCCL.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "oneCCL", but
  CMake did not find one.

  Could not find a package configuration file provided by "oneCCL" with any
  of the following names:

    oneCCLConfig.cmake
    oneccl-config.cmake

  Add the installation prefix of "oneCCL" to CMAKE_PREFIX_PATH or set
  "oneCCL_DIR" to a directory containing one of the above files.  If "oneCCL"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
make: *** No targets specified and no makefile found.  Stop.

在kaggle里,也不知道该怎么操作了....放弃

本机编译报错

-- MKL directory already exists. Skipping installation.
CMake Error at CMakeLists.txt:129 (find_package):
  By not providing "FindoneCCL.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "oneCCL", but
  CMake did not find one.

  Could not find a package configuration file provided by "oneCCL" with any
  of the following names:

    oneCCLConfig.cmake
    oneccl-config.cmake

  Add the installation prefix of "oneCCL" to CMAKE_PREFIX_PATH or set
  "oneCCL_DIR" to a directory containing one of the above files.  If "oneCCL"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!

直接pip 安装试试

pip install oneccl

报错依旧

安装这个试试

sudo apt install libdnnl3

尝试新的方法

# 下载基础工具包
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
echo "deb https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update

# 安装完整开发套件(包含oneCCL)
sudo apt install intel-oneapi-ccl intel-oneapi-ccl-devel

本机这边非常慢,kaggle那边也不算快

12% [4 intel-oneapi-mpi-2021.14 7797 kB/45.6 MB 17%]                                             23.0 kB/s 1h 14min 51s

kaggle那边已经装好了,现在可以编译ops了

!cd PaddleNLP/csrc/cpu && oneCCL_DIR=/opt/intel/oneapi/ccl/latest/ sh setup.sh

编译的时候有这样的报错

warnings.warn(warning_message)
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See Why you shouldn't invoke setup.py directly for details.
        ********************************************************************************

!!
  self.initialize_options()
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

kaggle最后这样报错/usr/bin/ld: cannot find -l:libxfastertransformer.so: No such file or directory

/usr/bin/ld: cannot find /kaggle/working/PaddleNLP/csrc/cpu/build/paddlenlp_ops/lib.linux-x86_64-cpython-310/avx_weight_only.o: No such file or directory
/usr/bin/ld: cannot find /kaggle/working/PaddleNLP/csrc/cpu/build/paddlenlp_ops/lib.linux-x86_64-cpython-310/stop_generation_multi_ends.o: No such file or directory
/usr/bin/ld: cannot find -l:libxfastertransformer.so: No such file or directory
/usr/bin/ld: cannot find -l:libxft_comm_helper.so: No such file or directory
collect2: error: ld returned 1 exit status
error: command '/usr/bin/x86_64-linux-gnu-g++' failed with exit code 1

发现是这里:

-- Using src='https://github.com/google/sentencepiece/releases/download/v0.1.99/sentencepiece-0.1.99.tar.gz'
/kaggle/working/PaddleNLP/csrc/cpu/xFasterTransformer/src/comm_helper/comm_helper.cpp:17:10: fatal error: oneapi/ccl.hpp: No such file or directory
   17 | #include "oneapi/ccl.hpp"
      |          ^~~~~~~~~~~~~~~~

也就是oneapi 

找到原因了,原来刚才设置的路径不对

# 标准Intel oneAPI路径(Linux)
export oneCCL_DIR=/opt/intel/oneapi/ccl/latest/lib/cmake/ccl

# 自定义安装路径
export oneCCL_DIR=/your/custom/path/lib/cmake/ccl

# 运行CMake时注入变量
cmake -DoneCCL_DIR=$oneCCL_DIR ..

应该用这句: 

!cd PaddleNLP/csrc/cpu && oneCCL_DIR=/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL sh setup.sh

还是报错,应该再装这个:

sudo apt install intel-oneapi-runtime-dnnl

 kaggle报错:Your notebook tried to allocate more memory than is available. It has restarted.(放弃)

这个没办法了,就是超限了

kaggle放弃

本机编译时报错: status_string: "Failure when receiving data from the peer"

-- Using src='https://github.com/oneapi-src/oneDNN/releases/download/v0.21/mklml_lnx_2019.0.5.20190502.tgz'
Cloning into 'oneccl'...
CMake Error at /home/skywalk/github/PaddleNLP/csrc/cpu/xFasterTransformer/build/xdnn_lib-prefix/src/xdnn_lib-stamp/download-xdnn_lib.cmake:170 (message):
  Each download failed!

    error: downloading 'https://github.com/intel/xFasterTransformer/releases/download/IntrinsicGemm/xdnn_v1.5.2.tar.gz' failed
          status_code: 56
          status_string: "Failure when receiving data from the peer"
          log:
          --- LOG BEGIN ---
          Host github.com:443 was resolved.

  IPv6: (none)

  IPv4:CMake Error at /home/skywalk/github/PaddleNLP/csrc/cpu/xFasterTransformer/build/examples/cpp/cmdline-prefix/src/cmdline-stamp/do 20.205.243.166

    Trying 20.205.243.166:443...

  Connected to github.com (20.205.243.166) port 443

  ALPN: curl offers h2,hwnload-cmdline.cmake:170 (message):
  Each download failed!

    error: downloading 'https://github.com/tanakh/cmdline/archive/rttp/1.1

  [5 bytes data]

  TLSv1.3 (OUT), TLS handshake, Client hello (1):

  [512 bytes data]

  [5 bytes data]

  TLSv1.3 (Iefs/heads/master.zip' failed
          status_code: 56
          status_string: "Failure when receiving data from the peer"
    N), TLS handshake, Server hello (2):
可能就是github抽风吧

暂时搁置

还可能需要的一些库:

sudo apt install libdnnl-dev
sudo apt install intel-oneapi-mkl
sudo apt install libmkl-vml-avx libmkl-dev intel-oneapi-runtime-mkl

安装intel-mkl # 数学库的时候出来提示

intel-mkl # 数学库
出来提示:
┌─────────────────────────────────────┤ Intel Math Kernel Library (Intel MKL) ├─────────────────────────────────────┐
 │                                                                                                                   │
 │ Intel MKL's Single Dynamic Library (SDL) is installed on your machine. This shared object can be used as an       │
 │ alternative to both libblas.so.3 and liblapack.so.3, so that packages built against BLAS/LAPACK can directly use  │
 │ MKL without rebuild.                                                                                              │
 │                                                                                                                   │
 │ However, MKL is non-free software, and in particular its source code is not publicly available. By using MKL as   │
 │ the default BLAS/LAPACK implementation, you might be violating the licensing terms of copyleft software that      │
 │ would become dynamically linked against it. Please verify that the licensing terms of the program(s) that you     │
 │ intend to use with MKL are compatible with the MKL licensing terms. For the case of software under the GNU        │
 │ General Public License, you may want to read this FAQ:                                                            │
 │                                                                                                                   │
 │     https://www.gnu.org/licenses/gpl-faq.html#GPLIncompatibleLibs                                                 │
 │                                                                                                                   │
 │                                                                                                                   │
 │ If you don't know what MKL is, or unwilling to set it as default, just choose the preset value or simply type     │
 │ Enter.                                                                                                            │
 │                                                                                                                   │
 │ Use libmkl_rt.so as the default alternative to BLAS/LAPACK?                                                       │
 │                                                                                                                   │
 │                                  <Yes>                                     <No>                                   │
 │                                                                                                                   │
也就是这个库需要单独的许可?
 

相关文章:

  • VUE2与VUE3的底层监听工具对比:Object.defineProperty() (Vue 2) 与 Proxy (Vue 3)
  • 树莓派急速安装ubuntu;映射磁盘与储存磁盘文件;ubuntu映射整个工程;保存系统工作状态
  • DOM4J解析XML, 修改xml的值
  • Springboot+mybait查询功能撰写
  • prometheus自定义监控(pushgateway和blackbox)和远端存储VictoriaMetrics
  • 深度解析React Native底层核心架构与演进之路
  • 零基础keil:设置注释快捷键
  • 【RNN神经网络】序列模型与RNN神经网络
  • 2.创建QT项目
  • 算法——层序遍历和中序遍历构造二叉树
  • 前端小食堂 | Day15 - VueUse 魔法道具库
  • 【NLP 40、残差神经网络】
  • Windows下安装MongoDB 8
  • 【数据分析】读取文档(读取Excel)
  • Flux 文生图技术解析与部署实践
  • Python 魔法方法介绍
  • 网络安全常识科普(百问百答)
  • 每日Attention学习26——Dynamic Weighted Feature Fusion
  • 双指针算法专题之——有效三角形的个数
  • 《Python深度学习》第二讲:深度学习的数学基础
  • 临夏金属装饰网站建设/怎么发帖子做推广
  • 电影网站源码怎么做的/八八网
  • 国外的服务器网站/深圳百度搜索排名优化
  • 深圳好的网站建设公/google搜索网址
  • 建设通网站cbi/淘宝关键词搜索量查询工具
  • 网站建设学生选课课程设计报告/直播回放老卡怎么回事