当前位置: 首页 > wzjs >正文

手机wap网站制作推广普通话手抄报

手机wap网站制作,推广普通话手抄报,广元建设网站要多少钱,定制网站开发与模板xformers xformers 是Meta使用在Transformer注意力优化上的,能够有效加速attention计算并降低显存。看了很多文章,只有安装教程,没有使用教程。 安装 xformers必须是在pytorch 2.7.0以上版本环境下,使用linux或python终端安装 …

xformers

xformers 是Meta使用在Transformer注意力优化上的,能够有效加速attention计算并降低显存。看了很多文章,只有安装教程,没有使用教程。

安装

xformers必须是在pytorch 2.7.0以上版本环境下,使用linux或python终端安装

pip install torch==2.7.1+cu118  --index-url https://download.pytorch.org/whl/cu118

会根据你的GPU,即cuda版本下载对应的torch,必须下载GPU版本的torch!后面会显示+cu118,使用pip list 查看一下是否安装成功
在这里插入图片描述
在下载好,下载对应cuda版本的xformers

# [linux only] cuda 11.8 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118
# [linux & win] cuda 12.6 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu126
# [linux & win] cuda 12.8 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu128
# [linux only] (EXPERIMENTAL) rocm 6.3 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/rocm6.3

运行

我们使用的是xformers.ops.memory_efficient_attention函数,会替代常规的注意力计算过程,下面是memory_efficient_attention函数原始代码,无需紧要:

def memory_efficient_attention(query: torch.Tensor,key: torch.Tensor,value: torch.Tensor,attn_bias: Optional[Union[torch.Tensor, AttentionBias]] = None,p: float = 0.0,scale: Optional[float] = None,*,op: Optional[AttentionOp] = None,
) -> torch.Tensor:"""Implements the memory-efficient attention mechanism following`"Self-Attention Does Not Need O(n^2) Memory" <http://arxiv.org/abs/2112.05682>`_.:Inputs shape:- Input tensors must be in format ``[B, M, H, K]``, where B is the batch size, M \the sequence length, H the number of heads, and K the embeding size per head- If inputs have dimension 3, it is assumed that the dimensions are ``[B, M, K]`` and ``H=1``- Inputs can be non-contiguous - we only require the last dimension's stride to be 1:Equivalent pytorch code:.. code-block:: pythonscale = 1 / query.shape[-1] ** 0.5query = query * scaleattn = query @ key.transpose(-2, -1)if attn_bias is not None:attn = attn + attn_biasattn = attn.softmax(-1)attn = F.dropout(attn, p)return attn @ value:Examples:.. code-block:: pythonimport xformers.ops as xops# Compute regular attentiony = xops.memory_efficient_attention(q, k, v)# With a dropout of 0.2y = xops.memory_efficient_attention(q, k, v, p=0.2)# Causal attentiony = xops.memory_efficient_attention(q, k, v,attn_bias=xops.LowerTriangularMask()):Supported hardware:NVIDIA GPUs with compute capability above 6.0 (P100+), datatype ``f16``, ``bf16`` and ``f32``.:Note:This operator may be nondeterministic.Raises:NotImplementedError: if there is no operator available to compute the MHAValueError: if inputs are invalid:parameter query: Tensor of shape ``[B, Mq, H, K]``:parameter key: Tensor of shape ``[B, Mkv, H, K]``:parameter value: Tensor of shape ``[B, Mkv, H, Kv]``:parameter attn_bias: Bias to apply to the attention matrix - defaults to no masking. \For common biases implemented efficiently in xFormers, see :attr:`xformers.ops.fmha.attn_bias.AttentionBias`. \This can also be a :attr:`torch.Tensor` for an arbitrary mask (slower).:parameter p: Dropout probability. Disabled if set to ``0.0``:parameter scale: Scaling factor for ``Q @ K.transpose()``. If set to ``None``, the default \scale (q.shape[-1]**-0.5) will be used.:parameter op: The operators to use - see :attr:`xformers.ops.AttentionOpBase`. \If set to ``None`` (recommended), xFormers \will dispatch to the best available operator, depending on the inputs \and options.:return: multi-head attention Tensor with shape ``[B, Mq, H, Kv]``"""return _memory_efficient_attention(Inputs(query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=scale),op=op,)

上面展示memory_efficient_attention函数代码,下面是我们常规的注意力计算过程:

class Attention(nn.Module):def __init__(self,
dim: int,num_heads: int = 8,qkv_bias: bool = False,proj_bias: bool = True,attn_drop: float = 0.0,proj_drop: float = 0.0,) -> None:super().__init__()self.num_heads = num_headshead_dim = dim // num_headsself.scale = head_dim**-0.5self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)self.attn_drop = nn.Dropout(attn_drop)self.proj = nn.Linear(dim, dim, bias=proj_bias)self.proj_drop = nn.Dropout(proj_drop)def forward(self, x: Tensor) -> Tensor:B, N, C = x.shapeqkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)q, k, v = qkv[0] * self.scale, qkv[1], qkv[2]attn = q @ k.transpose(-2, -1attn = attn.softmax(dim=-1)attn = self.attn_drop(attn)x = (attn @ v).transpose(1, 2).reshape(B, N, C)x = self.proj(x)x = self.proj_drop(x)return x

可以看到,下面这部分是主要注意力计算过程

        attn = q @ k.transpose(-2, -1)attn = attn.softmax(dim=-1)attn = self.attn_drop(attn)x = (attn @ v).transpose(1, 2).reshape(B, N, C)

那么使用memory_efficient_attention函数去替代:

 def forward(self,x:Tensor)->Tensor:x=self.layer(x)B,N,C=x.shape#1:组数 2:头数 3:patch数量 4:每个头分的维度   相当于将每张图片的不同维度分给了不同的头qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)q,k,v=qkv[0]*self.scale,qkv[1],qkv[2]q, k, v = qkv[0] * self.scale, qkv[1], qkv[2]#替代部分x = memory_efficient_attention(q, k, v, attn_bias=None)x = x.reshape(B, N, C)x=self.proj_drop(x)return x

因为xformers是在GPU环境下运行必须将张量和模型放到GPU设备上!下面进行实例化,这里没有直接使用模型,只是将注意力机制当成模型实例化,假如模型里面使用了注意力机制,将模型放到GPU上就行:

from torch import nn,Tensor
import torch
from xformers.ops import memory_efficient_attentionclass Attention(nn.Module):def __init__(self,dim: int,num_heads: int = 8,qkv_bias: bool = False,proj_bias: bool = True,attn_drop: float = 0.0,proj_drop: float = 0.0,)->None:super().__init__()self.num_heads=num_headshead_dim=dim//num_headsself.scale = head_dim**-0.5head_dim = dim // num_headsself.scale = head_dim**-0.5self.qkv=nn.Linear(dim,dim*3,bias=qkv_bias)self.attn_drop=nn.Dropout(attn_drop)self.proj=nn.Linear(dim,dim)self.proj_drop=nn.Dropout(proj_drop)self.layer=nn.LayerNorm(normalized_shape=768)def forward(self,x:Tensor)->Tensor:x=self.layer(x)B,N,C=x.shapeqkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)q,k,v=qkv[0]*self.scale,qkv[1],qkv[2]q, k, v = qkv[0] * self.scale, qkv[1], qkv[2]x = memory_efficient_attention(q, k, v, attn_bias=None)x = x.reshape(B, N, C)x=self.proj_drop(x)return x#得到GPU设备
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
#假设样本已经进行了编码,样本数为3,patch数为196,维度dim为768,将其放到GPU上
tensor1=torch.rand(3,196,768).to(device)
#实例化模型,将模型放到GPU上
att=Attention(dim=768).to(device)
result=att(tensor1)
print(result)

以下是运行结果,上面代码完全可以复制使用
在这里插入图片描述

http://www.dtcms.com/wzjs/290520.html

相关文章:

  • b站推广网站mmm换脸怎么开通网站
  • 网站建设要多少钱怎样沈阳seo顾问
  • 网站开发微信支付详细教程短视频搜索优化
  • 有哪些网站是用vue做的旺道seo网站优化大师
  • 全国公安机关网站备案网页制作与设计
  • 药检局信息化网站系统建设方案index百度指数
  • 免费网站开发合同范本网站排名首页前三位
  • wordpress 循环两个交替app搜索优化
  • 网站开发和网站建设有什么不同新闻发稿推广
  • 有哪些网站是可以接单做任务的绍兴seo优化
  • 网站毕设代做在线代理浏览网址
  • ubuntu 安装wordpress教程东莞关键词优化推广
  • 在墙外的优质网站上海自媒体推广
  • 做视频网站 买带宽百度网址
  • 网站开发到上线 多久湖南网站设计
  • wordpress获取页面的当前位置最好的seo外包
  • 免费咨询服务合同模板下载沈阳网站推广优化
  • 网站微信二维码悬浮重庆关键词快速排名
  • 网站的营销推广日喀则网站seo
  • 莆田城市投资建设集团网站宣传软文
  • 北京建站模板制作搜索引擎排名优化
  • wordpress 元数据windows优化大师官方下载
  • 九网互联怎么建设网站一键生成个人网站
  • 拉萨网站建设熊掌号seo文章是什么
  • 广东网站建设微信商城开发在线seo诊断
  • 手机建网站怎么弄如何解决网站只收录首页的一些办法
  • 基于web的旅游网站设计疫情最新政策最新消息
  • 什么网站做水果蔬菜批发如何做平台推广赚钱
  • 科技公司网站建设营销推广公司
  • 做农产品的b2c网站百度小说风云榜排行榜官网