当前位置: 首页 > wzjs >正文

南阳网站推广招聘wordpress 前台 上传

南阳网站推广招聘,wordpress 前台 上传,商丘网站建设商丘,镇江京口区定制PyTorch后端通信(backend)实战 缘起 相关研究中,需要替换PyTorch的通信后端,定制分布式训练中例如all_reduce的算法,包括通信协议的定制。 PyTorch默认的通信后端为‘gloo’和‘nccl’,其他支持的还…

定制PyTorch后端通信(backend)实战

缘起

相关研究中,需要替换PyTorch的通信后端,定制分布式训练中例如all_reduce的算法,包括通信协议的定制。

PyTorch默认的通信后端为‘gloo’和‘nccl’,其他支持的还有mpi(需要基于源码编译),实际上PyTorch 2.7的源代码里还实现了一个‘ucc’的实验性的通信后端,后续可以参考UCC后端的代码进行相关定制。

参见:Distributed communication package - torch.distributed — PyTorch 2.7 documentation

本地环境

$ lsb_release -a   # Ubuntu 23.04

$ python3 --version  # Python 3.11.4

$ python3 -c "import torch; print(torch.__version__)"  # 2.7.0+cu126

实战

对于一个Python和PyTorch的新手,摸索了不知道多久,终于发现解决问题的入口:

1、使用 C++ 扩展定制进程组后端 — PyTorch 教程 2.7.0+cu126 文档 - PyTorch 深度学习库 此链接为中文,标题‘使用c++扩展定制(PyTorch)进程组后端’,正是我要找的东西。

2、https://github.com/H-Huang/torch_collective_extension 一个扩展torch集合通信的例子,该项目给出了两种扩展方式,一种基于custom_backend(推荐的方式),一种基于custom_process_group(旧方式,网上大部分找到的内容都是这种方式),其中基于custom_backend模式的代码与(1)相同,但更全面,默认给出了有关PyTorch所有集合通信的接口示例。

下面主要基于(1)的代码进行验证

代码

在用户目录下创建相关的目录及代码,本例中目录名为‘custom_bankend’,包含四个文件:

  1. dummy.hpp  --  c++头文件
  2. dummy.cpp  --  c++源文件
  3. setup.py  --  Python项目编译和配置脚本
  4. example.py  --  验证测试代码

以下假设用户目录为‘/home/~usrname’

1. dummy.hpp

// file name: dummy.hpp

// 代码源自PyTorch官网实例:https://pytorch.ac.cn/tutorials/intermediate/process_group_cpp_extension_tutorial.html

#pragma once

// 根据实际情况调整路径,参见 setup.py

// #include <torch/python.h>

#include <torch/csrc/api/include/torch/python.h>

#include <torch/csrc/distributed/c10d/Backend.hpp>

#include <torch/csrc/distributed/c10d/Work.hpp>

#include <torch/csrc/distributed/c10d/Store.hpp>

#include <torch/csrc/distributed/c10d/Types.hpp>

#include <torch/csrc/distributed/c10d/Utils.hpp>

#include <pybind11/chrono.h>

namespace c10d {

class BackendDummy : public Backend {

  public:

    BackendDummy(int rank, int size);

    c10::intrusive_ptr<Work> allgather(

        std::vector<std::vector<at::Tensor>>& outputTensors,

        std::vector<at::Tensor>& inputTensors,

        const AllgatherOptions& opts = AllgatherOptions()) override;

    c10::intrusive_ptr<Work> allreduce(

        std::vector<at::Tensor>& tensors,

        const AllreduceOptions& opts = AllreduceOptions()) override;

    // The collective communication APIs without a custom implementation

    // will error out if invoked by application code.

    static c10::intrusive_ptr<Backend> createBackendDummy(

        const c10::intrusive_ptr<::c10d::Store>& store,

        int rank,

        int size,

        const std::chrono::duration<float>& timeout);

    static void BackendDummyConstructor() __attribute__((constructor)) {

        py::object module = py::module::import("torch.distributed");

        py::object register_backend =

            module.attr("Backend").attr("register_backend");

        // torch.distributed.Backend.register_backend will add `dummy` as a

        // new valid backend. 注意是在这里指定通信后端的名称(dummy)!

        register_backend("dummy", py::cpp_function(createBackendDummy));

    }

};

class WorkDummy : public Work {

    friend class BackendDummy;

  public:

    WorkDummy(

      OpType opType,

      c10::intrusive_ptr<c10::ivalue::Future> future) // future of the output

      : Work(

          -1, // rank, only used by recvAnySource, irrelevant in this demo

          opType),

      future_(std::move(future)) {}

    bool isCompleted() override;

    bool isSuccess() const override;

    bool wait(std::chrono::milliseconds timeout = kUnsetTimeout) override;

    virtual c10::intrusive_ptr<c10::ivalue::Future> getFuture() override;

  private:

    c10::intrusive_ptr<c10::ivalue::Future> future_;

};

} // namespace c10d

2. dummy.cpp

// file name: dummy.cpp

// 代码源自PyTorch官网实例:https://pytorch.ac.cn/tutorials/intermediate/process_group_cpp_extension_tutorial.html

#include "dummy.hpp"

#include <iostream>

namespace c10d {

bool WorkDummy::isCompleted() {

  return true;

}

bool WorkDummy::isSuccess() const {

  return true;

}

bool WorkDummy::wait(std::chrono::milliseconds /* unused */) {

  return true;

}

c10::intrusive_ptr<c10::ivalue::Future> WorkDummy::getFuture() {

  return future_;

}

// If necessary, pass store/rank/size to the ctor and exchange connection

// information here

BackendDummy::BackendDummy(int rank, int size)

    : Backend(rank, size) {}

// This is a dummy allgather that sets all output tensors to zero

// Modify the implementation to conduct real communication asynchronously

c10::intrusive_ptr<Work> BackendDummy::allgather(

        std::vector<std::vector<at::Tensor>>& outputTensors,

        std::vector<at::Tensor>& inputTensors,

        const AllgatherOptions& /* unused */) {

    for (auto& outputTensorVec : outputTensors) {

        for (auto& outputTensor : outputTensorVec) {

            outputTensor.zero_();

        }

    }

    auto future = c10::make_intrusive<c10::ivalue::Future>(

        c10::ListType::create(c10::ListType::create(c10::TensorType::get())));

    future->markCompleted(c10::IValue(outputTensors));

    return c10::make_intrusive<WorkDummy>(OpType::ALLGATHER, std::move(future));

}

// This is a dummy allreduce that sets all output tensors to zero

// Modify the implementation to conduct real communication asynchronously

c10::intrusive_ptr<Work> BackendDummy::allreduce(

        std::vector<at::Tensor>& tensors,

        const AllreduceOptions& opts) {

    for (auto& tensor : tensors) {

        tensor.zero_();

    }

    auto future = c10::make_intrusive<c10::ivalue::Future>(

        c10::ListType::create(c10::TensorType::get()));

future->markCompleted(c10::IValue(tensors));

# 原文此处有误,操作类型应该为:ALLREDUCE

    return c10::make_intrusive<WorkDummy>(OpType::ALLREDUCE, std::move(future));

}

c10::intrusive_ptr<Backend> BackendDummy::createBackendDummy(

        const c10::intrusive_ptr<::c10d::Store>& /* unused */,

        int rank,

        int size,

        const std::chrono::duration<float>& /* unused */) {

    return c10::make_intrusive<BackendDummy>(rank, size);

}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

    m.def("createBackendDummy", &BackendDummy::createBackendDummy);

}

} // namespace c10d

3. setup.py

# file name: setup.py

# 代码源自PyTorch官网实例:https://pytorch.ac.cn/tutorials/intermediate/process_group_cpp_extension_tutorial.html

# ~/.local/lib/python3.11/site-packages/torch

import os

import sys

import torch

from setuptools import setup

from torch.utils import cpp_extension

# 根据实际情况调整路径

# sources = ["src/dummy.cpp"]

# include_dirs = [f"{os.path.dirname(os.path.abspath(__file__))}/include/"]

# 在下面的代码行根据本地实际路径替换

sources = ["dummy.cpp"]

include_dirs = ["/home/~~~/.local/lib/python3.11/site-packages/torch/include"]

if torch.cuda.is_available():

    module = cpp_extension.CUDAExtension(

        name = "dummy_collectives",

        sources = sources,

        include_dirs = include_dirs,

    )

else:

    module = cpp_extension.CppExtension(

        name = "dummy_collectives",

        sources = sources,

        include_dirs = include_dirs,

    )

# 不支持 cuda

# module = cpp_extension.CppExtension(

#     name = "dummy_collectives",

#     sources = sources,

#     include_dirs = include_dirs,

# )

setup(

    name = "Dummy-Collectives",

    version = "0.0.1",

    ext_modules = [module],

    cmdclass={'build_ext': cpp_extension.BuildExtension}

)

4. example.py

提示:本文件源码改动较大

# 源码初始化有所改动,请密切注意

# 在终端运行命令:

# torchrun --nnodes=1 --nproc-per-node=2 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 example.py

import os

import torch

import dummy_collectives

import torch.distributed as dist

rank = int(os.getenv('RANK','0'))

world_size = int(os.getenv('WORLD_SIZE','1'))

dist.init_process_group(backend="dummy", rank=rank, world_size=world_size)

# cpu allreduce

x = torch.ones(6)

dist.all_reduce(x)

print(f"cpu allreduce: {x}")

# gpu allrecduce : 实质是计算在 cuda,后端多机之间通信可定制

if torch.cuda.is_available():

    y = x.cuda()

    dist.all_reduce(y)

    print(f"cuda allreduce: {y}")

    try:

        dist.broadcast(y, 0)

    except RuntimeError:

        print("got RuntimeError when calling broadcast")

torch.distributed.destroy_process_group()

编译&安装配置

注意将下面所有命令中的路径参数替换为自己本地的HOME路径。

$ python3 setup.py build

成功!在当前目录生成‘build’文件夹以及 .so 和 .egg-info文件。

$ python3 setup.py install

报错,如下:

running install

/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.

  warnings.warn(

/usr/lib/python3/dist-packages/setuptools/command/easy_install.py:146: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.

  warnings.warn(

error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the installation directory:

    [Errno 13] Permission denied: '/usr/local/lib/python3.11/dist-packages/test-easy-install-68533.write-test'

The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was:

    /usr/local/lib/python3.11/dist-packages/

Perhaps your account does not have write access to this directory?  If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account.  If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable.

For information on other options, you may wish to consult the documentation at:

  https://setuptools.pypa.io/en/latest/deprecated/easy_install.html

Please make the appropriate changes for your system and try again.

目测是权限的问题,目标安装目录‘/usr/local/lib/python3.11/dist-packages/’不允许普通用户写,真正的安装目录应该在用户HOME目录下,即:

‘/home/~usrname/.local/lib/python3.11/site-packages’

通过输出的错误中的提示在命令行中指定安装目录,如下:

$ python3 setup.py install --install-dir=/home/~usrname/.local/lib/python3.11/site-packages

报错,如下:

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]

   or: setup.py --help [cmd1 cmd2 ...]

   or: setup.py --help-commands

   or: setup.py cmd --help

error: option --install-dir not recognized

似乎是setup.py后跟随的命令不对

$ python3 setup.py --help

输出:

Common commands: (see '--help-commands' for more)

  setup.py build      will build the package underneath 'build/'

  setup.py install    will install the package

Global options: 。。。

百思不得其解,根据命令提示只支持两个指令:build和install

根据参考资料(1)的提示,将命令改为‘develop’,执行:

$ python3 setup.py develop --install-dir=/home/~usrname/.local/lib/python3.11/site-packages

成功,输出:

running develop

/usr/lib/python3/dist-packages/setuptools/command/easy_install.py:146: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.

  warnings.warn(

/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.

  warnings.warn(

running egg_info

creating Dummy_Collectives.egg-info

writing Dummy_Collectives.egg-info/PKG-INFO

writing dependency_links to Dummy_Collectives.egg-info/dependency_links.txt

writing top-level names to Dummy_Collectives.egg-info/top_level.txt

writing manifest file 'Dummy_Collectives.egg-info/SOURCES.txt'

/home/~usrname/.local/lib/python3.11/site-packages/torch/utils/cpp_extension.py:576: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.

  warnings.warn(msg.format('we could not find ninja.'))

reading manifest file 'Dummy_Collectives.egg-info/SOURCES.txt'

writing manifest file 'Dummy_Collectives.egg-info/SOURCES.txt'

running build_ext

copying build/lib.linux-x86_64-cpython-311/dummy_collectives.cpython-311-x86_64-linux-gnu.so ->

Creating /home/~usrname/.local/lib/python3.11/site-packages/Dummy-Collectives.egg-link (link to .)

Dummy-Collectives 0.0.1 is already the active version in easy-install.pth

Installed /home/~usrname/src/pytorch/custom_backend

Processing dependencies for Dummy-Collectives==0.0.1

Finished processing dependencies for Dummy-Collectives==0.0.1

验证

$ pip3 show dummy_collectives

Name: Dummy-Collectives

Version: 0.0.1

Summary:

Home-page:

Author:

Author-email:

License:

Location: /home/~usrname/src/pytorch/custom_backend

Editable project location: /home/~usrname/src/pytorch/custom_backend

Requires:

Required-by:

命令执行成功,说明已经完成通信后端 dummy 的注册。

$ python3 -c "import dummy_collectives"

报错,如下:

Traceback (most recent call last):

  File "<string>", line 1, in <module>

ImportError: libc10.so: cannot open shared object file: No such file or directory

通过find命令定位‘libc10.so’

$ find ~ -name libc10.so

根据搜索结果将文件路径添加到环境变量‘LD_LIBRARY_PATH’

本人测试时文件路径为:

/home/~usrname/.local/lib/python3.11/site-packages/torch/lib/libc10.so

$ export LD_LIBRARY_PATH=…

更新环境变量后再次执行:

$ python3 -c "import dummy_collectives"

继续报错,如下:

Traceback (most recent call last):

  File "<string>", line 1, in <module>

ImportError: /home/~usrname/.local/lib/python3.11/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkGetErrorLog_12_6, version libnvJitLink.so.12

这次是找不到‘libnvJitLink.so.12

继续通过find命令定位文件,本人测试的文件路径为:

/home/~usrname/.local/lib/python3.11/site-packages/nvidia/nvjitlink/lib/libnvJitLink.so.12

继续将文件路径加入到环境变量‘LD_LIBRARY_PATH’

重要提示:经后续验证,在‘import dummy_collectives’之前执行‘import torch’即可无需配置环境变量‘LD_LIBRARY_PATH’,猜测原因可能是执行‘import torch’时自定加载了相关的.so文件。正确命令如下:

$ python3 -c "import torch; import dummy_collectives"

再次执行测试:

$ python3 -c "import dummy_collectives"

这次没有报错,有一个警告,如下:

/home/~usrname/.local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:350: UserWarning: Device capability of dummy unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.

  warnings.warn(

试图消除此警告,通过修改‘.hpp’文件中‘register_backend()’方法,增加形如‘devices="cpu"’的参数,未果,编译不能通过,应该是torch版本的问题,暂时搁置不管,继续验证。

$ torchrun --nnodes=1 --nproc-per-node=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 example.py

成功!输出:

/home/~usrname/.local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:350: UserWarning: Device capability of dummy unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.

  warnings.warn(

cpu allreduce: tensor([0., 0., 0., 0., 0., 0.])

cuda allreduce: tensor([0., 0., 0., 0., 0., 0.], device='cuda:0')

got RuntimeError when calling broadcast

提示:可以在上述命令中通过指定‘--nproc-per-node=x’参数启动多个进程同时分布式运行。

重要发现:

如果在import dummy_collectives 之前执行了 import torch,则无需指定之前的环境变量‘LD_LIBRARY_PATH’的操作。

后续工作:基于UDP和RoCE-v2协议定制PyTorch后端的all_reduce操作。


文章转载自:

http://WvQda1eL.tkfLb.cn
http://VUDGCMfU.tkfLb.cn
http://4SczvYCz.tkfLb.cn
http://KhwbXm6J.tkfLb.cn
http://9KysF5S7.tkfLb.cn
http://WEREZ2xR.tkfLb.cn
http://oo24nePP.tkfLb.cn
http://qFeYvfFH.tkfLb.cn
http://0oKZvTU2.tkfLb.cn
http://N6IPr95I.tkfLb.cn
http://LDTm8p0W.tkfLb.cn
http://eJjYUXY0.tkfLb.cn
http://IDYpTZOh.tkfLb.cn
http://Vzm73eby.tkfLb.cn
http://1NkmGtYF.tkfLb.cn
http://iyp6kyco.tkfLb.cn
http://K7OLm06l.tkfLb.cn
http://CXY9Ko6C.tkfLb.cn
http://TCY4FP2s.tkfLb.cn
http://oqyI2tFg.tkfLb.cn
http://7YhJeBOx.tkfLb.cn
http://dS1Ctbuk.tkfLb.cn
http://MoFYETOn.tkfLb.cn
http://4RNgjzIg.tkfLb.cn
http://wr3cx2EK.tkfLb.cn
http://cwAg8W0W.tkfLb.cn
http://DUZ3G6TB.tkfLb.cn
http://8OKUTbUa.tkfLb.cn
http://qeodJF0g.tkfLb.cn
http://1qJ2aLxl.tkfLb.cn
http://www.dtcms.com/wzjs/728740.html

相关文章:

  • 企业网站建设应注意哪些问题网站和域名都注册怎么连接成网址
  • 资料查询网站怎么做各电商网站的特点
  • 誉重网站建设WordPress里面自定义功能
  • 晋城龙采网站建设wordpress图像缩放插件
  • 三桥做网站个人做外贸接订单网站
  • 游戏网站appwordpress导航文件夹
  • 兰州网络营销网站wordpress08影视站
  • 建外贸网站费用网络营销应具备的技能
  • 深圳夜场网站建设托管企业网站软件
  • 一号网站建设徐州做网站设计
  • 网站建设职业兴趣要求中国建设银行官网网址多少
  • 建设银行陕西分行网站wordpress访问要10多秒
  • 深圳网络营销|深圳网站建设公司|专业网络营销运营推广策划公司常州做网站公司有哪些
  • 泰州专业做网站公司苏州园区属于哪个区
  • seo优化网站教程百度建设银行网银网站特色
  • 扁平化设计风格的网站模板免费下载收录之家
  • 怎么做百度自己的网站空间小程序商城多少钱
  • 网站建设面试自我介绍工程项目管理软件排名
  • 网站建设的基本步骤和过程东莞有多少个镇区
  • 怎么自做网站中国建设网站轨道自检验收报告表
  • 如何做外贸营销型网站推广网站推广培训哪里好
  • 校园网站的作用网站开发工程师和软件工程
  • 网站制作介绍如何做行业平台网站
  • 网站建设后台管理泉州模板网站建站
  • 那个网站做二手买卖的手机访问网站页面丢失
  • 网站内容管理系统怎么用常见的搜索引擎有哪些
  • 松江网站建设公司有网页源码 怎么做网站
  • 石大远程网页设计与网站建设答案互联网公司名字大全参考
  • 浙江省建设银行网站关于公司网站怎么做
  • 莱州哪有做网站的微信分销是什么