当前位置: 首页 > news >正文

Hunyuan-MT-7B模型介绍

模型介绍

混元翻译模型,包含一个翻译模型Hunyuan-MT-7B和一个集成模型Hunyuan-MT-Chimera。翻译模型用来将待翻译的文本翻译成目标语言,集成模型用来把翻译模型的多个翻译结果集成为一个更好的翻译。重点支持33种语言互译,支持5种民汉语言。

核心特性与优势

  • 在WMT25参赛的31种语言中,有30种语言获得了第一名的成绩。 Hunyuan-MT-7B在业界同尺寸模型中效果最优。
  • Hunyuan-MT-Chimera-7B是业界首个开源翻译集成模型,可以进一步提升翻译效果。
  • 提出了一个完整的翻译模型训练范式,从预训练->CPT->SFT->翻译强化->集成强化,翻译效果达到同尺寸SOTA(最先进水平)。

模型性能

模型加载

from modelscope import AutoModelForCausalLM, AutoTokenizer
import osmodel_name_or_path = "Tencent-Hunyuan/Hunyuan-MT-7B"tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")  # You may want to use bfloat16 and/or move to GPU here
/home/six/Zhou/test_source/gpt-oss-unsloth/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.htmlfrom .autonotebook import tqdm as notebook_tqdmDownloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-MT-7B
Downloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-MT-7BLoading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.99it/s]

模型配置

model.config
HunYuanDenseV1Config {"add_classification_head": false,"architectures": ["HunYuanDenseV1ForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"attention_head_dim": 128,"bos_token_id": 1,"cla_share_factor": 2,"class_num": 0,"dense_list": [4096,0],"dtype": "float32","eos_token_id": 127960,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"im_end_id": 5,"im_newline_id": 11,"im_start_id": 4,"initializer_range": 0.02,"intermediate_size": 14336,"mask_init_id": 12,"max_position_embeddings": 32768,"mlp_bias": false,"model_type": "hunyuan_v1_dense","norm_type": "rms","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"org_vocab_size": 290943,"pad_id": 127961,"pad_token_id": 0,"pool_type": "last","pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": {"alpha": 100000.0,"beta_fast": 32,"beta_slow": 1,"factor": 1.0,"mscale": 1.0,"mscale_all_dim": 1.0,"type": "dynamic"},"rope_theta": 10000.0,"text_end_id": 7,"text_start_id": 6,"tie_word_embeddings": true,"transformers_version": "4.56.0","use_cache": true,"use_cla": false,"use_qk_norm": true,"use_rotary_pos_emb": true,"vocab_size": 128256
}

模型结构

model
HunYuanDenseV1ForCausalLM((model): HunYuanDenseV1Model((embed_tokens): Embedding(128256, 4096, padding_idx=0)(layers): ModuleList((0-31): 32 x HunYuanDenseV1DecoderLayer((self_attn): HunYuanDenseV1Attention((q_proj): Linear(in_features=4096, out_features=4096, bias=False)(k_proj): Linear(in_features=4096, out_features=1024, bias=False)(v_proj): Linear(in_features=4096, out_features=1024, bias=False)(o_proj): Linear(in_features=4096, out_features=4096, bias=False)(query_layernorm): HunYuanDenseV1RMSNorm((128,), eps=1e-05)(key_layernorm): HunYuanDenseV1RMSNorm((128,), eps=1e-05))(mlp): HunYuanDenseV1MLP((gate_proj): Linear(in_features=4096, out_features=14336, bias=False)(up_proj): Linear(in_features=4096, out_features=14336, bias=False)(down_proj): Linear(in_features=14336, out_features=4096, bias=False)(act_fn): SiLU())(input_layernorm): HunYuanDenseV1RMSNorm((4096,), eps=1e-05)(post_attention_layernorm): HunYuanDenseV1RMSNorm((4096,), eps=1e-05)))(norm): HunYuanDenseV1RMSNorm((4096,), eps=1e-05)(rotary_emb): HunYuanDenseV1RotaryEmbedding())(lm_head): Linear(in_features=4096, out_features=128256, bias=False)
)

模型调用

messages = [{"role": "user", "content": "Translate the following segment into Chinese, without additional explanation.\n\nGet something off your chest"},
]
tokenized_chat = tokenizer.apply_chat_template(messages,tokenize=True,add_generation_prompt=False,return_tensors="pt"
)outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
output_text = tokenizer.decode(outputs[0])
output_text
'<|startoftext|>Translate the following segment into Chinese, without additional explanation.\n\nGet something off your chest<|extra_0|>把心里的话说出来吧。<|eos|>'

文章转载自:

http://qTc7X6PO.sqmbb.cn
http://fY9ICpPM.sqmbb.cn
http://TbTbfEcA.sqmbb.cn
http://b8wYSbFx.sqmbb.cn
http://njiCBweU.sqmbb.cn
http://PeDFOoAE.sqmbb.cn
http://HBfA05EB.sqmbb.cn
http://Fyf641bq.sqmbb.cn
http://0Ju0I9rl.sqmbb.cn
http://8EuAeBon.sqmbb.cn
http://TcwQCOWW.sqmbb.cn
http://WssautIJ.sqmbb.cn
http://jIEqcO4D.sqmbb.cn
http://6h8gZfxU.sqmbb.cn
http://kvf0JKSi.sqmbb.cn
http://5wLLkOix.sqmbb.cn
http://04O0UgKT.sqmbb.cn
http://lYo3hvWG.sqmbb.cn
http://b4LLRYky.sqmbb.cn
http://E3nohxEU.sqmbb.cn
http://EMBVL0rT.sqmbb.cn
http://MgZs0IFd.sqmbb.cn
http://YdC4oDWN.sqmbb.cn
http://nx97Df8v.sqmbb.cn
http://gRnO5IU5.sqmbb.cn
http://h43cWcBm.sqmbb.cn
http://MdZ95gPy.sqmbb.cn
http://Diq8sfPM.sqmbb.cn
http://pJL4tKnm.sqmbb.cn
http://YRRYpQke.sqmbb.cn
http://www.dtcms.com/a/367186.html

相关文章:

  • 告别低效广告!亚马逊关键词筛选全流程攻略
  • matlab版本粒子群算法(PSO)在路径规划中的应用
  • ultralytics/nn/tasks.py源码学习笔记——核心函数parse_model
  • 【正整数的最优分解2的次方和形式非0次方】2022-11-1
  • Java基础知识点汇总(五)
  • 什么是压力测试,有哪些方法
  • AI入坑: Trae 通过http调用.net 开发的 mcp server
  • IIS服务器下做浏览器缓存
  • 小白学OpenCV系列3-图像算数运算
  • jQuery 入门:一份献给初学者的完全指南
  • 怎么做到这一点:让 Agent 可以像人类一样 边听边想、边说,而不是“等一句话 → 一次性返回”
  • 风险慎投!IF 狂跌10分,国人发文超80%,这本SCI的1区TOP还能撑多久?
  • 剧本杀APP系统开发:引领娱乐行业新潮流的科技力量
  • Linux2.6内核进程O(1)调度队列
  • 【OpenHarmony文件管理子系统】文件访问接口mod_fileio解析
  • 【全息投影】全息风扇的未来,超薄化、智能化与交互化
  • “SOD-923”封装系列ESD静电二极管 DC0521D9 ESD9X5.0S
  • 架构-亿级流量性能调优实践
  • 开讲了,全栈经验之谈系列:写给进阶中的小伙伴
  • STM32F103C8T6开发板入门学习——寄存器和库函数介绍
  • 0904网络设备配置与管理第二次授课讲义
  • [科普] 卫星导航系统的授时原理与精度分析
  • Linux tail 命令使用说明
  • 机器学习基础-day04-数学方法实现线性回归
  • 如何在MacOS上卸载并且重新安装Homebrew
  • 基于 GEE 计算温度植被干旱指数 TVDI 并可视化分析
  • LED电路图判断灯在低电平时亮、高电平时灭
  • SpringBoot实现国际化(多语言)配置
  • 【代码随想录算法训练营——Day2】数组——209.长度最小的子数组、59.螺旋矩阵II、区间和、开发商购买土地
  • LinuxC++项目开发日志——高并发内存池(1-定长内存池)