SM4算法的Verilog流水线实现(带测试)
一、SM4的Verilog流水线实现原理
SM4算法是中国国家密码管理局发布的分组密码标准,采用32轮非线性迭代结构。Verilog流水线实现通过将算法分解为多个处理阶段,每个阶段由专用硬件并行执行,显著提高了吞吐量。
流水线设计的关键是将32轮加密操作展开为32个连续的硬件处理单元。每个时钟周期,数据从一个处理单元传递到下一个,形成流水作业。这种设计使得系统可以同时处理多个数据块的不同加密阶段,极大地提高了整体处理速度。
本实现的设计主要包括两个部分:密钥扩展模块和加密模块。密钥扩展模块预先计算32轮子密钥并存储在寄存器中,加密模块则使用这些子密钥并行处理32轮变换,这种分离设计允许密钥扩展和加密操作并行执行。
流水线控制通过状态信号实现,包括busy、din_valid和dout_valid等信号协调各模块工作。当busy信号为低时表示系统就绪可以接收新数据,dout_valid信号则指示输出数据有效,这种清晰的握手协议确保了数据在流水线中的正确流动。
二、Verilog代码解析
1. sm4_top模块
sm4_top是系统的顶层模块,负责实例化和连接密钥扩展与加密模块。它包含时钟、复位、主密钥加载、明文输入和密文输出等接口。顶层设计采用清晰的层次结构,将密钥处理和加密处理分离。
该模块通过busy信号指示系统状态,这是密钥扩展模块和加密模块busy信号的或操作结果。这种设计使得外部控制器可以方便地了解系统忙闲状态,协调数据输入时机。
输入输出采用标准的同步设计,所有信号在时钟上升沿采样。load_mkey信号控制主密钥加载,din_valid信号启动加密过程,dout_valid信号标记输出数据有效,形成完整的数据处理流程。
module sm4_top (input clk,input rst_n,input [127:0] mkey,input load_mkey,input [127:0] plaintext,input din_valid,output [127:0] ciphertext,output dout_valid,output busy
);wire [1023:0] rk_flatten;wire encrypt_busy;wire key_expand_busy;assign busy = encrypt_busy | key_expand_busy;encrypt u_en (.clk(clk),.rst_n(rst_n),.din_valid(din_valid),.plaintext(plaintext),.ciphertext(ciphertext),.dout_valid(dout_valid),.busy(encrypt_busy),.rk_flatten(rk_flatten));key_expand u_ke (.clk(clk),.rst_n(rst_n),.load_mkey(load_mkey),.mkey(mkey),.rk_flatten(rk_flatten),.busy(key_expand_busy));
endmodule
2. key_expand模块
key_expand模块负责从128位主密钥生成32个32位的轮密钥。实现中首先将主密钥与固定密钥FK进行异或,然后通过32轮迭代生成轮密钥。每轮使用不同的固定参数cki。
模块内部采用移位寄存器结构存储中间密钥状态。round_counter计数器控制密钥生成轮数,busy信号在密钥生成期间保持高电平。生成的轮密钥通过rk_flatten总线输出到加密模块。
密钥扩展算法核心是key_expand_round子模块,它实现SM4的密钥扩展非线性变换。该变换包括S盒替换和线性变换操作,与加密轮函数类似但参数不同,确保了密钥的充分混淆和扩散。
module key_expand (input clk,input rst_n,input load_mkey,input [127:0] mkey,output [1023:0] rk_flatten,output reg busy
);localparam FK = 128'ha3b1bac656aa3350677d9197b27022dc;reg [4:0] round_counter;reg [31:0] K0, K1, K2, K3;reg [31:0] round_keys [0:31];wire [31:0] cki;wire [31:0] round_dout;wire start_expand = load_mkey & ~busy;integer i;assign rk_flatten = {round_keys[0], round_keys[1], round_keys[2], round_keys[3],round_keys[4], round_keys[5], round_keys[6], round_keys[7],round_keys[8], round_keys[9], round_keys[10], round_keys[11],round_keys[12], round_keys[13], round_keys[14], round_keys[15],round_keys[16], round_keys[17], round_keys[18], round_keys[19],round_keys[20], round_keys[21], round_keys[22], round_keys[23],round_keys[24], round_keys[25], round_keys[26], round_keys[27],round_keys[28], round_keys[29], round_keys[30], round_keys[31]};always @(posedge clk, negedge rst_n) beginif (~rst_n) {K0, K1, K2, K3} <= 128'h0;else beginif (start_expand) {K0, K1, K2, K3} <= mkey ^ FK;else if (busy) {K0, K1, K2, K3} <= {K1, K2, K3, round_dout};endendalways @(posedge clk) beginif (busy) beginround_keys[31] <= round_dout;for (i = 30; i >= 0; i = i - 1) beginround_keys[i] <= round_keys[i+1];endendendalways @(posedge clk, negedge rst_n) beginif (~rst_n) round_counter <= 5'd0;else beginif (busy) round_counter <= round_counter + 5'd1;else if (start_expand) round_counter <= 5'd0;endendalways @(posedge clk, negedge rst_n) beginif (~rst_n) busy <= 1'b0;else beginif (start_expand) busy <= 1'b1;else if (round_counter == 5'd31) busy <= 1'b0;endendkey_expand_cki u_ck (.round(round_counter), .cki(cki));key_expand_round u_round (.din({K0, K1, K2, K3}), .cki(cki), .dout(round_dout));
endmodule
3. encrypt模块
encrypt模块实现SM4的32轮加密流水线。它接收plaintext输入和rk_flatten轮密钥,输出ciphertext密文。模块内部包含32个encrypt_round实例,形成完整的处理流水线。
该模块使用round_ctrl移位寄存器跟踪数据在流水线中的进度。X0_3数组存储每轮的中间状态,32轮完成后通过重排列生成最终密文。dout_valid信号在第32个周期后置位,表示输出有效。
流水线控制逻辑确保新数据输入时能正确初始化加密过程。busy信号综合了din_valid和round_ctrl状态,准确反映模块工作状态。这种设计允许背靠背的数据输入,最大化吞吐量。
module encrypt (input clk,input rst_n,input din_valid,input [127:0] plaintext,output [127:0] ciphertext,output dout_valid,output busy,input [1023:0] rk_flatten
);reg [32:0] round_ctrl;reg [127:0] X0_3 [0:32];wire [31:0] round_keys [0:31];wire [127:0] round_out [0:31];integer i;genvar j;assign dout_valid = round_ctrl[32];assign busy = din_valid | (|round_ctrl);assign ciphertext = {X0_3[32][31:0], X0_3[32][63:32], X0_3[32][95:64], X0_3[32][127:96]};assign {round_keys[0], round_keys[1], round_keys[2], round_keys[3],round_keys[4], round_keys[5], round_keys[6], round_keys[7],round_keys[8], round_keys[9], round_keys[10], round_keys[11],round_keys[12], round_keys[13], round_keys[14], round_keys[15],round_keys[16], round_keys[17], round_keys[18], round_keys[19],round_keys[20], round_keys[21], round_keys[22], round_keys[23],round_keys[24], round_keys[25], round_keys[26], round_keys[27],round_keys[28], round_keys[29], round_keys[30], round_keys[31]} = rk_flatten;always @(posedge clk, negedge rst_n) beginif (~rst_n) beginfor (i = 0; i < 33; i = i + 1) beginX0_3[i] <= 128'h0;endend else beginif (din_valid) beginX0_3[0] <= plaintext;endfor (i = 1; i < 33; i = i + 1) beginX0_3[i] <= round_out[i-1];endendendalways @(posedge clk, negedge rst_n) beginif (~rst_n) beginround_ctrl <= 33'd0;end else beginround_ctrl <= {round_ctrl[32:0], din_valid};endendgeneratefor (j = 0; j < 32; j = j + 1) begin : enr_instancesencrypt_round u_er (.din (X0_3[j]),.rki (round_keys[j]),.dout(round_out[j]));endendgenerateendmodule
4. encrypt_round模块
encrypt_round实现SM4的轮函数变换,模块接收128位输入数据和32位轮密钥,输出128位变换结果。核心操作包括32位异或、S盒替换和L变换。变换过程首先将输入数据与轮密钥组合,然后通过4个并行S盒进行字节替换。替换结果经过L变换(循环移位和异或)后,与原始数据混合生成输出。
module encrypt_round (input [127:0] din,input [31:0] rki,output [127:0] dout
);wire [31:0] word_0, word_1, word_2, word_3;wire [31:0] transform_din;wire [31:0] transform_dout;wire [7:0] sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3;wire [7:0] sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3;wire [31:0] sbox_wout = {sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3};assign {word_0, word_1, word_2, word_3} = din;assign transform_din = word_1 ^ word_2 ^ word_3 ^ rki;assign {sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3} = transform_din;assign transform_dout = ((sbox_wout ^ {sbox_wout[29:0], sbox_wout[31:30]}) ^ ({sbox_wout[21:0], sbox_wout[31:22]}^ {sbox_wout[13:0], sbox_wout[31:14]})) ^ {sbox_wout[7:0], sbox_wout[31:8]};assign dout = {word_1, word_2, word_3, transform_dout ^ word_0};sm4_sbox sm4_sbox0 (.s_in (sbox_bin0), .s_out(sbox_bout0));sm4_sbox sm4_sbox1 (.s_in (sbox_bin1), .s_out(sbox_bout1));sm4_sbox sm4_sbox2 (.s_in (sbox_bin2), .s_out(sbox_bout2));sm4_sbox sm4_sbox3 (.s_in (sbox_bin3), .s_out(sbox_bout3));endmodule
5. 其他模块
key_expand_cki模块提供密钥扩展所需的32轮常数cki,采用查找表方式实现。key_expand_round模块实现密钥扩展的轮函数,结构与encrypt_round类似但线性变换不同。sm4_sbox模块实现SM4的S盒替换,使用256字节的查找表实现非线性变换。
module key_expand_cki(input [4:0] round,output reg [31:0] cki
);always@(*)case(round)5'h00: cki <= 32'h00070e15;5'h01: cki <= 32'h1c232a31;5'h02: cki <= 32'h383f464d;5'h03: cki <= 32'h545b6269;5'h04: cki <= 32'h70777e85;5'h05: cki <= 32'h8c939aa1;5'h06: cki <= 32'ha8afb6bd;5'h07: cki <= 32'hc4cbd2d9;5'h08: cki <= 32'he0e7eef5;5'h09: cki <= 32'hfc030a11;5'h0a: cki <= 32'h181f262d;5'h0b: cki <= 32'h343b4249;5'h0c: cki <= 32'h50575e65;5'h0d: cki <= 32'h6c737a81;5'h0e: cki <= 32'h888f969d;5'h0f: cki <= 32'ha4abb2b9;5'h10: cki <= 32'hc0c7ced5;5'h11: cki <= 32'hdce3eaf1;5'h12: cki <= 32'hf8ff060d;5'h13: cki <= 32'h141b2229;5'h14: cki <= 32'h30373e45;5'h15: cki <= 32'h4c535a61;5'h16: cki <= 32'h686f767d;5'h17: cki <= 32'h848b9299;5'h18: cki <= 32'ha0a7aeb5;5'h19: cki <= 32'hbcc3cad1;5'h1a: cki <= 32'hd8dfe6ed;5'h1b: cki <= 32'hf4fb0209;5'h1c: cki <= 32'h10171e25;5'h1d: cki <= 32'h2c333a41;5'h1e: cki <= 32'h484f565d;5'h1f: cki <= 32'h646b7279;default: cki <= 32'h0;endcaseendmodule
module key_expand_round (input [127:0] din,input [ 31:0] cki,output [ 31:0] dout
);wire [31:0] word_0, word_1, word_2, word_3;wire [31:0] transform_din;wire [31:0] transform_dout;wire [7:0] sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3;wire [7:0] sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3;wire [31:0] sbox_wout = {sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3};assign {word_0, word_1, word_2, word_3} = din;assign transform_din = word_1 ^ word_2 ^ word_3 ^ cki;assign {sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3} = transform_din;assign transform_dout = (sbox_wout^{sbox_wout[18:0],sbox_wout[31:19]})^{sbox_wout[8:0],sbox_wout[31:9]};assign dout = transform_dout ^ word_0;sm4_sbox sm4_sbox0 (.s_in (sbox_bin0), .s_out(sbox_bout0));sm4_sbox sm4_sbox1 (.s_in (sbox_bin1), .s_out(sbox_bout1));sm4_sbox sm4_sbox2 (.s_in (sbox_bin2), .s_out(sbox_bout2));sm4_sbox sm4_sbox3 (.s_in (sbox_bin3), .s_out(sbox_bout3));endmodule
module sm4_sbox(input [7:0] s_in,output [7:0] s_out
);
reg [7:0] sbox[0:255];
initial
beginsbox[000]=8'hd6; sbox[001]=8'h90; sbox[002]=8'he9; sbox[003]=8'hfe; sbox[004]=8'hcc; sbox[005]=8'he1; sbox[006]=8'h3d; sbox[007]=8'hb7;sbox[008]=8'h16; sbox[009]=8'hb6; sbox[010]=8'h14; sbox[011]=8'hc2; sbox[012]=8'h28; sbox[013]=8'hfb; sbox[014]=8'h2c; sbox[015]=8'h05;sbox[016]=8'h2b; sbox[017]=8'h67; sbox[018]=8'h9a; sbox[019]=8'h76; sbox[020]=8'h2a; sbox[021]=8'hbe; sbox[022]=8'h04; sbox[023]=8'hc3;sbox[024]=8'haa; sbox[025]=8'h44; sbox[026]=8'h13; sbox[027]=8'h26; sbox[028]=8'h49; sbox[029]=8'h86; sbox[030]=8'h06; sbox[031]=8'h99;sbox[032]=8'h9c; sbox[033]=8'h42; sbox[034]=8'h50; sbox[035]=8'hf4; sbox[036]=8'h91; sbox[037]=8'hef; sbox[038]=8'h98; sbox[039]=8'h7a;sbox[040]=8'h33; sbox[041]=8'h54; sbox[042]=8'h0b; sbox[043]=8'h43; sbox[044]=8'hed; sbox[045]=8'hcf; sbox[046]=8'hac; sbox[047]=8'h62;sbox[048]=8'he4; sbox[049]=8'hb3; sbox[050]=8'h1c; sbox[051]=8'ha9; sbox[052]=8'hc9; sbox[053]=8'h08; sbox[054]=8'he8; sbox[055]=8'h95;sbox[056]=8'h80; sbox[057]=8'hdf; sbox[058]=8'h94; sbox[059]=8'hfa; sbox[060]=8'h75; sbox[061]=8'h8f; sbox[062]=8'h3f; sbox[063]=8'ha6;sbox[064]=8'h47; sbox[065]=8'h07; sbox[066]=8'ha7; sbox[067]=8'hfc; sbox[068]=8'hf3; sbox[069]=8'h73; sbox[070]=8'h17; sbox[071]=8'hba;sbox[072]=8'h83; sbox[073]=8'h59; sbox[074]=8'h3c; sbox[075]=8'h19; sbox[076]=8'he6; sbox[077]=8'h85; sbox[078]=8'h4f; sbox[079]=8'ha8;sbox[080]=8'h68; sbox[081]=8'h6b; sbox[082]=8'h81; sbox[083]=8'hb2; sbox[084]=8'h71; sbox[085]=8'h64; sbox[086]=8'hda; sbox[087]=8'h8b;sbox[088]=8'hf8; sbox[089]=8'heb; sbox[090]=8'h0f; sbox[091]=8'h4b; sbox[092]=8'h70; sbox[093]=8'h56; sbox[094]=8'h9d; sbox[095]=8'h35;sbox[096]=8'h1e; sbox[097]=8'h24; sbox[098]=8'h0e; sbox[099]=8'h5e; sbox[100]=8'h63; sbox[101]=8'h58; sbox[102]=8'hd1; sbox[103]=8'ha2;sbox[104]=8'h25; sbox[105]=8'h22; sbox[106]=8'h7c; sbox[107]=8'h3b; sbox[108]=8'h01; sbox[109]=8'h21; sbox[110]=8'h78; sbox[111]=8'h87;sbox[112]=8'hd4; sbox[113]=8'h00; sbox[114]=8'h46; sbox[115]=8'h57; sbox[116]=8'h9f; sbox[117]=8'hd3; sbox[118]=8'h27; sbox[119]=8'h52;sbox[120]=8'h4c; sbox[121]=8'h36; sbox[122]=8'h02; sbox[123]=8'he7; sbox[124]=8'ha0; sbox[125]=8'hc4; sbox[126]=8'hc8; sbox[127]=8'h9e;sbox[128]=8'hea; sbox[129]=8'hbf; sbox[130]=8'h8a; sbox[131]=8'hd2; sbox[132]=8'h40; sbox[133]=8'hc7; sbox[134]=8'h38; sbox[135]=8'hb5;sbox[136]=8'ha3; sbox[137]=8'hf7; sbox[138]=8'hf2; sbox[139]=8'hce; sbox[140]=8'hf9; sbox[141]=8'h61; sbox[142]=8'h15; sbox[143]=8'ha1;sbox[144]=8'he0; sbox[145]=8'hae; sbox[146]=8'h5d; sbox[147]=8'ha4; sbox[148]=8'h9b; sbox[149]=8'h34; sbox[150]=8'h1a; sbox[151]=8'h55;sbox[152]=8'had; sbox[153]=8'h93; sbox[154]=8'h32; sbox[155]=8'h30; sbox[156]=8'hf5; sbox[157]=8'h8c; sbox[158]=8'hb1; sbox[159]=8'he3;sbox[160]=8'h1d; sbox[161]=8'hf6; sbox[162]=8'he2; sbox[163]=8'h2e; sbox[164]=8'h82; sbox[165]=8'h66; sbox[166]=8'hca; sbox[167]=8'h60;sbox[168]=8'hc0; sbox[169]=8'h29; sbox[170]=8'h23; sbox[171]=8'hab; sbox[172]=8'h0d; sbox[173]=8'h53; sbox[174]=8'h4e; sbox[175]=8'h6f;sbox[176]=8'hd5; sbox[177]=8'hdb; sbox[178]=8'h37; sbox[179]=8'h45; sbox[180]=8'hde; sbox[181]=8'hfd; sbox[182]=8'h8e; sbox[183]=8'h2f;sbox[184]=8'h03; sbox[185]=8'hff; sbox[186]=8'h6a; sbox[187]=8'h72; sbox[188]=8'h6d; sbox[189]=8'h6c; sbox[190]=8'h5b; sbox[191]=8'h51;sbox[192]=8'h8d; sbox[193]=8'h1b; sbox[194]=8'haf; sbox[195]=8'h92; sbox[196]=8'hbb; sbox[197]=8'hdd; sbox[198]=8'hbc; sbox[199]=8'h7f;sbox[200]=8'h11; sbox[201]=8'hd9; sbox[202]=8'h5c; sbox[203]=8'h41; sbox[204]=8'h1f; sbox[205]=8'h10; sbox[206]=8'h5a; sbox[207]=8'hd8;sbox[208]=8'h0a; sbox[209]=8'hc1; sbox[210]=8'h31; sbox[211]=8'h88; sbox[212]=8'ha5; sbox[213]=8'hcd; sbox[214]=8'h7b; sbox[215]=8'hbd;sbox[216]=8'h2d; sbox[217]=8'h74; sbox[218]=8'hd0; sbox[219]=8'h12; sbox[220]=8'hb8; sbox[221]=8'he5; sbox[222]=8'hb4; sbox[223]=8'hb0;sbox[224]=8'h89; sbox[225]=8'h69; sbox[226]=8'h97; sbox[227]=8'h4a; sbox[228]=8'h0c; sbox[229]=8'h96; sbox[230]=8'h77; sbox[231]=8'h7e;sbox[232]=8'h65; sbox[233]=8'hb9; sbox[234]=8'hf1; sbox[235]=8'h09; sbox[236]=8'hc5; sbox[237]=8'h6e; sbox[238]=8'hc6; sbox[239]=8'h84;sbox[240]=8'h18; sbox[241]=8'hf0; sbox[242]=8'h7d; sbox[243]=8'hec; sbox[244]=8'h3a; sbox[245]=8'hdc; sbox[246]=8'h4d; sbox[247]=8'h20;sbox[248]=8'h79; sbox[249]=8'hee; sbox[250]=8'h5f; sbox[251]=8'h3e; sbox[252]=8'hd7; sbox[253]=8'hcb; sbox[254]=8'h39; sbox[255]=8'h48;
end
assign s_out=sbox[s_in];
endmodule
三、实验结果
使用iverilog进行快速功能验证,测试了10组明文/密文对,testbench和测试结果如下。所有测试用例均通过,实际输出与预期密文完全一致。测试平台自动比较结果并显示通过/失败信息,验证了设计的正确性。VCD波形文件被成功生成,便于后续分析。
`timescale 1ns/1psmodule sm4_top_tb;reg clk=0;reg rst_n=0;reg [127:0] mkey=0;reg load_mkey=0;reg [127:0] plaintext=0;reg din_valid=0;wire [127:0] ciphertext;wire dout_valid;wire busy;sm4_top uut (.clk(clk),.rst_n(rst_n),.mkey(mkey),.load_mkey(load_mkey),.plaintext(plaintext),.din_valid(din_valid),.ciphertext(ciphertext),.dout_valid(dout_valid),.busy(busy));always #5 clk = ~clk;reg [127:0] test_plaintexts [0:9];reg [127:0] expected_ciphertexts [0:9];initial begintest_plaintexts[0]=128'h0123456789abcdeffedcba9876543210;test_plaintexts[1]=128'h19dfd145a155ba9582618728cec3129b;test_plaintexts[2]=128'h5ea6ab0e8c952e165b5cb8770cc68454;test_plaintexts[3]=128'h217da38edffa0a313bae2de200c2f0a4;test_plaintexts[4]=128'h9b90f75138905a2455536f8e8c7c48bb;test_plaintexts[5]=128'h2b393de18384c3908814a72bd9082802;test_plaintexts[6]=128'h20b68d21653ae1e63e1f4186a8b38971;test_plaintexts[7]=128'h50bbcc6daca27a2beaeed62752fefcab;test_plaintexts[8]=128'hba030f96f7d880675c0888e2c286aa07;test_plaintexts[9]=128'h744312ac78eab65501985ef67532d86b;expected_ciphertexts[0]=128'h681edf34d206965e86b3e94f536e4246;expected_ciphertexts[1]=128'h4f4bb97495eda50ee3d4773f8a70961b;expected_ciphertexts[2]=128'h0c18de048cf8ad1a136b32426539fbd8;expected_ciphertexts[3]=128'hba6b80da7ab003b8ec1a65b6e44e50aa;expected_ciphertexts[4]=128'hf606e5dacd97c4bb6cdb5c51a210a4e2;expected_ciphertexts[5]=128'h86637413cc9695b38d7fddd2c8b3682b;expected_ciphertexts[6]=128'hbfbf1d47b7956bc2564d79b59d08cdbc;expected_ciphertexts[7]=128'hb86d60aa7ad3047aee3a75348e011e49;expected_ciphertexts[8]=128'h7f75667ecf8f1079337d70643c0e74a5;expected_ciphertexts[9]=128'h0e3a9246a1b0ad477b5d0c33ff72ca40;endinteger i = 0;initial begin#15 rst_n = 1;mkey = 128'h0123456789abcdeffedcba9876543210;load_mkey = 1;@(negedge clk);load_mkey = 0;wait(busy == 0);@(negedge clk);#20 plaintext=test_plaintexts[0]; din_valid = 1;#10 din_valid = 0;#20 plaintext = test_plaintexts[1]; din_valid = 1;#10 plaintext = test_plaintexts[2];#10 plaintext = test_plaintexts[3];#10 plaintext = test_plaintexts[4];#10 din_valid = 0;#30 plaintext = test_plaintexts[5]; din_valid = 1;#10 plaintext = test_plaintexts[6];#10 plaintext = test_plaintexts[7];#10 plaintext = test_plaintexts[8];#10 plaintext = test_plaintexts[9];#10 din_valid = 0;wait(busy == 0);#100 $finish;endalways @(posedge clk) beginif (dout_valid) beginif (ciphertext === expected_ciphertexts[i]) begin$display("Test %0d: Passed, Expected %h, Actual %h", i, expected_ciphertexts[i], ciphertext);end else begin$display("Test %0d: Failed, Expected %h, Actual %h", i, expected_ciphertexts[i], ciphertext);endi = i + 1;endendinitial begin$dumpfile("sm4_top.vcd");$dumpvars(0, sm4_top_tb);end
endmodule
在gtkwave和Modelsim中观察仿真波形,可以清晰看到流水线的工作过程。当din_valid有效时,明文进入流水线,经过11个周期后dout_valid变高,输出有效密文。busy信号准确反映了系统状态,密钥扩展和加密过程没有重叠时的控制信号行为符合预期。
用Vivado(XC7A35T-1CSG324C)进行综合,结果如下:
四、总结
本文介绍了SM4分组密码算法的Verilog流水线实现方案。SM4作为中国国家标准密码算法,采用32轮非线性迭代结构,本设计通过全展开流水线技术实现高性能硬件加密。系统分为密钥扩展和加密处理两大模块,其中密钥扩展模块预先计算32轮子密钥,加密模块则通过32级流水线并行处理数据。Verilog代码采用层次化设计,包括顶层控制、密钥扩展、加密轮函数和S盒等子模块,通过状态信号协调流水线运作。实验验证表明,该设计功能正确,能高效处理加密任务,在标准测试向量下全部通过验证。