当前位置：首页 > news >正文

实战 - 使用 AutoAWQ 进行量化

news 2025/9/11 19:16:26

文章目录

- 一、准备
- - 1、安装 autoawq
  - 2、模型准备
- 二、量化
- - - `config.json` 文件变化
- 三、加载量化后模型
- - - 量化后的输出
    - 原始输出
    - 对比
- 四、查看模型的精度
- - 1、查看模型卡
  - 2、查看 config.json 中的 `torch_dtype`
  - 3、打印模型信息
  - 4、model.dtype 未必是模型精度

一、准备

1、安装 autoawq

pip install autoawq 
pip install transformers==4.47.1

使用的较低版本的 transformers，如果执行下面代码有问题，可以检查 transformers 版本。

目前我的测试 Python 环境为 3.9

2、模型准备

这里以 mistralai/Mistral-7B-Instruct-v0.2 为例

如果下载有问题，可以前往模型界面查看是否需要申请权限：https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

后面代码会自动下载模型，你也可以提前下载模型：

huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2

如果网络受限，可以设置镜像地址到环境变量：

export HF_ENDPOINT='https://hf-mirror.com'

二、量化

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'mistralai/Mistral-7B-Instruct-v0.2'
quant_path = 'mistral-instruct-v0.2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# 查看模型类型
model.dtype
# torch.float32 - 32-bit（FP32） 

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

quant_config 也可以写成：

from transformers import AwqConfig, AutoConfig
quantization_config = AwqConfig(
    bits=quant_config["w_bit"],
    group_size=quant_config["q_group_size"],
    zero_point=quant_config["zero_point"],
    version=quant_config["version"].lower(),
).to_dict()

model.model.config.quantization_config = quantization_config

`config.json` 文件变化

config.json 文件会变成下方的样子：

相比原来的文件，多出 quantization_config 内容，其中 "quant_method": "awq"

{
  "_name_or_path": "/home/wx/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.2/snapshots/3ad372fc79158a2148299e3318516c786aeded6c",
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "quantization_config": {
    "bits": 4,
    "group_size": 128,
    "modules_to_not_convert": null,
    "quant_method": "awq",
    "version": "gemm",
    "zero_point": true
  },
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.47.1",
  "use_cache": false,
  "vocab_size": 32000
}

原始 config.json

{
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.36.0",
  "use_cache": true,
  "vocab_size": 32000
}

三、加载量化后模型


from transformers import AutoModelForCausalLM, AutoTokenizer
quant_dir = '/home/wx/mistral-instruct-v0.2-awq'  
model = AutoModelForCausalLM.from_pretrained(quant_dir, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(quant_dir, trust_remote_code=True)


prompt = "Tell me about blackhole."
prompt_template=f'''{prompt}'''

tokens = tokenizer(prompt_template, return_tensors="pt").input_ids.cuda()

generated_ids = model.generate(
                                tokens, 
                                do_sample=True,
                                temperature=0.7,
                                top_p=0.95,
                                top_k=40,
                                max_new_tokens=512
                              )
decoded = tokenizer.decode(generated_ids[0])
print(decoded)

量化后的输出

GPU 占用：4550MiB

<s> Tell me about blackhole.

A black hole is a region in space where the gravitational pull is so strong that nothing, not even light, can escape. It's called a "black" hole because it appears black due to the absence of light emanating from it.

Black holes are formed when a massive star collapses in on itself after it has exhausted its nuclear fuel. The collapse causes the star to shrink down to an incredibly small size, creating an incredibly dense object. This object is so dense that its gravity warps space and time around it, forming an event horizon, which is the point of no return. Once anything crosses this event horizon, it's pulled into the black hole and cannot escape.

Black holes come in different sizes, with the smallest being about the size of a star and the largest being billions of times larger than the sun. The largest black hole that has been discovered is located at the center of the galaxy, and it's estimated to be about 40 billion times the mass of the sun.

Despite their intimidating name, black holes are not necessarily a threat to us. The closest known black hole to Earth is about 1,600 light-years away, which is far enough that we don't need to worry about being sucked in. However, they are fascinating objects that continue to captivate scientists and the general public alike.</s>

原始输出

mistralai/Mistral-7B-Instruct-v0.2 , GPU 占用：21988MiB

<s> Tell me about blackhole. I've heard that it is some sort of astronomical thing, but I don't really understand what it is or how it works.

A black hole is an extremely dense object in space that has such strong gravitational pull that nothing, not even light, can escape from it once it gets too close. Black holes are formed when a massive star collapses in on itself after it has exhausted its nuclear fuel and can no longer produce the pressure needed to counteract the force of gravity.

The boundary around a black hole from which nothing can escape is called the event horizon. This is not a physical boundary that you can see, but rather a theoretical construct based on the laws of physics. Once an object crosses the event horizon, it is considered to be inside the black hole itself.

Black holes are not completely black, as they do emit some form of radiation, but they appear black because they absorb all the light that falls on them. This is due to the fact that the intense gravitational pull causes the surface of the black hole to be at a temperature so hot that it emits very little visible light.

Black holes can vary in size, from small ones that are only a few times the mass of the sun, to supermassive black holes that can be millions or even billions of times the mass of the sun. The supermassive black holes are thought to be at the center of most, if not all, galaxies, including our own Milky Way.

Despite their fearsome reputation, black holes are not a threat to us here on Earth, as they are typically located billions of light-years away. However, they are fascinating objects of study for astronomers and physicists, who continue to learn new things about them and their role in the universe.</s>

对比

	原始	4bit 量化后
占用磁盘大小	14G	3.9G
GPU 占用	21988MiB	4550MiB （4.8倍）

四、查看模型的精度

对于一个模型，我们想知道原始的精度是多少，可以用下面几种方式：

1、查看模型卡

如：https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
右边的 Safetensors 信息
在这里插入图片描述

2、查看 config.json 中的 `torch_dtype`

"torch_dtype": "bfloat16",

3、打印模型信息

from transformers import AutoTokenizer

model_path = 'mistralai/Mistral-7B-Instruct-v0.2'

# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)

for name, param in model.named_parameters():
    print(f"{name}: {param.dtype}")
    break  # 只打印第一个权重的数据类型

4、model.dtype 未必是模型精度

上述模型，model.dtype 打印的结果为 torch.float32，表示模型当前是以 32-bit 浮点数（FP32）精度加载的。
config.json 中的 "torch_dtype": "bfloat16"表示模型设计时支持或推荐使用 bfloat16 精度，但实际加载时可能由于环境或代码设置未启用 bfloat16。

2025-03-08（六）