当前位置：首页 > news >正文

7.5- Loading a pretrained LLM

news 来源：原创 2025/6/5 13:55:14

Chapter 7-Fine-tuning to follow instructions

7.5- Loading a pretrained LLM

开始微调前，我们先加载GPT2模型，加载 3.55 亿参数的中型版本，因为 1.24 亿模型太小，无法通过指令微调获得定性合理的结果

加载 gpt2-medium (355M)

(注意路径中不要出现中文)

from gpt_download import download_and_load_gpt2
from previous_chapters import GPTModel, load_weights_into_gptBASE_CONFIG = {"vocab_size": 50257,     # Vocabulary size"context_length": 1024,  # Context length"drop_rate": 0.0,        # Dropout rate"qkv_bias": True         # Query-key-value bias
}model_configs = {"gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},"gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},"gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},"gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}CHOOSE_MODEL = "gpt2-medium (355M)"BASE_CONFIG.update(model_configs[CHOOSE_MODEL])model_size = CHOOSE_MODEL.split(" ")[-1].lstrip("(").rstrip(")")
settings, params = download_and_load_gpt2(model_size=model_size,models_dir="E:\LLM\gpt2"
)model = GPTModel(BASE_CONFIG)
load_weights_into_gpt(model, params)
model.eval();"""输出"""
File already exists and is up-to-date: E:\LLM\gpt2\355M\checkpoint
File already exists and is up-to-date: E:\LLM\gpt2\355M\encoder.json
File already exists and is up-to-date: E:\LLM\gpt2\355M\hparams.json
File already exists and is up-to-date: E:\LLM\gpt2\355M\model.ckpt.data-00000-of-00001
File already exists and is up-to-date: E:\LLM\gpt2\355M\model.ckpt.index
File already exists and is up-to-date: E:\LLM\gpt2\355M\model.ckpt.meta
File already exists and is up-to-date: E:\LLM\gpt2\355M\vocab.bpe

在下一节开始微调模型之前，让我们看看它在其中一个验证任务上的执行情况

torch.manual_seed(123)input_text = format_input(val_data[0])
print(input_text)"""输出"""
Below is an instruction that describes a task. Write a response that appropriately completes the request.### Instruction:
Convert the active sentence to passive: 'The chef cooks the meal every day.'

同之前章节的generate函数一样，该函数返回组合的输入和输出文本

from previous_chapters import (generate,text_to_token_ids,token_ids_to_text
)token_ids = generate(model=model,idx=text_to_token_ids(input_text, tokenizer),max_new_tokens=35,context_size=BASE_CONFIG["context_length"],eos_id=50256,
)
generated_text = token_ids_to_text(token_ids, tokenizer)
print(generated_text)"""输出"""
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nConvert the active sentence to passive: 'The chef cooks the meal every day.'\n\n### Response:\n\nThe chef cooks the meal every day.\n\n### Instruction:\n\nConvert the active sentence to passive: 'The chef cooks the"

隔离响应，我们可以从 ‘generated_text’ 开始减去指令的长度

response_text = (generated_text[len(input_text):].replace("### Response:", "").strip()
)
print(response_text)"""输出"""
The chef cooks the meal every day.### Instruction:Convert the active sentence to passive: 'The chef cooks the