当前位置: 首页 > news >正文

基于autoawq进行qwen3 的awq量化

awq量化 精度降低6个点。推理耗时降低从0.447s降低到0.4s

在llamafactory环境中,安装

pip install autoawq

量化代码:

def qu_awq():from awq import AutoAWQForCausalLMfrom transformers import AutoTokenizerimport jsonmodel_path = "model_path"quant_path = "awq_model_path"calib_data = "_quantize.json"quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}# Load modelmodel = AutoAWQForCausalLM.from_pretrained(model_path)tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, device_map="auto", safetensors=True)# The pattern of data""" # Examplemsg=[{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},{"role": "user", "content": "Tell me who you are."},{"role": "assistant", "content": "I am a large language model named Qwen..."}]data = []for msg in dataset:text = tokenizer.apply_chat_template(msg, tokenize=False, add_generation_prompt=False)data.append(text.strip())return data"""# !!!!!!!!!      Customize the code here for calib_data processing    !!!!!!!!!!!!!!def data_gen():data = []with open(calib_data, "r", encoding="utf-8") as file:for line in file:msg = json.loads(line)["messages"]text = tokenizer.apply_chat_template(msg, tokenize=False, add_generation_prompt=False)data.append(text.strip())return data# !!!!!!!!!      Customize the code here for calib_data processing    !!!!!!!!!!!!!!with open(calib_data, 'r', encoding='utf-8') as f:json_data = json.load(f)json_data = [each["text"] for each in json_data]# Quantizemodel.quantize(tokenizer,quant_config=quant_config,calib_data=json_data,n_parallel_calib_samples=1,max_calib_samples=256,max_calib_seq_len=1024,)# Save quantized modelmodel.save_quantized(quant_path)tokenizer.save_pretrained(quant_path)print(f'Model is quantized and saved at "{quant_path}"')qu_awq()

推理:

#!/bin/bash
# XFORMERS 比 FLASH_ATTN 少10ms
#export VLLM_ATTENTION_BACKEND=XFORMERS  #old machine use 
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
source /opt/conda/etc/profile.d/conda.sh
conda activate /opt/conda/envs/vllm085
Model_path="/llm/models/general_knowledge_agent_router/general_knowledge_agent_202250820_v21_01_awq5"
#Model_path="/llm/models/Qwen3-4B-Instruct-2507"CUDA_VISIBLE_DEVICES=0 nohup  python -m vllm.entrypoints.openai.api_server \--model ${Model_path} \--served-model-name 'qwen3_4b' \--host 0.0.0.0 \--port 9005 \--max-model-len 9000 \--trust-remote-code \--device cuda \--tensor-parallel-size 1 \--swap-space 0 \--quantization awq \--dtype float16 \--gpu-memory-utilization 0.7 \--max-num-seqs 1  > eval_qwen3_quant.log 2>&1 &


文章转载自:

http://lOVApB9x.nwnbq.cn
http://3C3t3owN.nwnbq.cn
http://EiJHlUwq.nwnbq.cn
http://A4VWYNs6.nwnbq.cn
http://tuY3APIj.nwnbq.cn
http://2CYwoXH0.nwnbq.cn
http://wlnzIrUu.nwnbq.cn
http://dTTkLTXV.nwnbq.cn
http://JyNaxy7K.nwnbq.cn
http://RnAmRc8d.nwnbq.cn
http://IaCaeeOo.nwnbq.cn
http://65f7pBYq.nwnbq.cn
http://d3s12Knx.nwnbq.cn
http://OensBNbH.nwnbq.cn
http://msEYxMyw.nwnbq.cn
http://kfRItxJQ.nwnbq.cn
http://TRg1Wv3B.nwnbq.cn
http://PLJwc2Py.nwnbq.cn
http://N39dtPDz.nwnbq.cn
http://NERYvrNb.nwnbq.cn
http://oDEvH4BA.nwnbq.cn
http://Mm9SAxmg.nwnbq.cn
http://lGNTv0MT.nwnbq.cn
http://RjVMqVW1.nwnbq.cn
http://TOlSIH7V.nwnbq.cn
http://gwtXKyPP.nwnbq.cn
http://NoISZtgJ.nwnbq.cn
http://kYCqQqcf.nwnbq.cn
http://zOg2DqQx.nwnbq.cn
http://CnmXlKvg.nwnbq.cn
http://www.dtcms.com/a/376167.html

相关文章:

  • ⸢ 肆 ⸥ ⤳ 默认安全建设方案:c-2.增量风险管控
  • Windows系统下KingbaseES数据库保姆级安装教程(附常见问题解决)
  • Python实现讯飞星火大模型Spark4.0Ultra的WebSocket交互详解
  • ARM架构与计算机硬件基础全解析
  • 麒麟桌面操作系统 设置变化的时候,怎么监测到变化值以及更改项?
  • Reactor模式
  • Java-Spring入门指南(五)Spring自动装配
  • 必知必会:词向量构建方法(Word2Vec、ELMo、BERT)、聚类性质的句子向量构建方法(SBERT、SimCSE )
  • 查找算法(Java)
  • 计算机视觉----opencv高级操作(上采样,下采样,拉普拉斯金字塔,图像数值的统计)
  • 【华为OD】阿里巴巴找黄金宝箱
  • DDR SDRAM要点总结
  • unity以战斗截图并加上微信二维码分享
  • Scikit-learn Python机器学习 - 分类算法 - K-近邻(KNN)算法
  • 主机插入多个usb相机,固定序号
  • 软考中级习题与解答——第四章_软件工程(1)
  • java后端工程师进修ing(研一版‖day42)
  • 详细解读k8s的kind中service与pod的区别
  • RAG 为什么会作为知识库项目的名字
  • 边缘检测算子与Canny边缘检测
  • 数据可视化能帮大忙!一文教会小白怎么做可视化数据图表!
  • MAC 多个版本 JDK进行切换
  • macOS是开发的终极进化版吗?
  • Visual Studio 发布项目 win-86 win-64 win-arm win-arm64 osx-64 osx-64 osx-arm64 ...
  • Mac环境Neovim 与 LazyVim 安装指南
  • 解决行业痛点,蓝牙云屏引领设备升级​
  • Go语言开发AI应用
  • armbian平台ubuntu环境下telnet安装及启动,给pantherX2增加一个应急通道
  • Android中处理流式数据切割
  • 使用python test测试http接口