【FPGA开发工具】HLS中AXI4-Stream接口的使用
HLS中AXI4-Stream接口的使用
(1)AXI4-Stream接口介绍
在此大佬的文章进行学习:
详解AXI4-Stream接口(1)–什么是AXI4-Stream接口?-CSDN博客
详解AXI4-Stream接口(2)–AXI4-Stream接口IP源码分析及仿真_axis-stream master-CSDN博客
详解AXI4-Stream接口(3)–AXI4 STREAM DATA FIFO IP的使用_axi data fifo-CSDN博客
(2)HLS程序
程序的功能就是利用AXI4-Stream接口输入一些数据,然后输出这些数据的累计和
#include <ap_int.h>
#include <hls_stream.h>
#include <ap_axi_sdata.h>typedef ap_axiu<32,0,0,0> axis32_t;void axis_accum(hls::stream<axis32_t> &s_axis, hls::stream<axis32_t> &m_axis) {
#pragma HLS INTERFACE axis port=s_axis
#pragma HLS INTERFACE axis port=m_axis
#pragma HLS INTERFACE ap_ctrl_none port=return// 内部累加器ap_uint<32> acc = 0;// 连续工作:不断读取输入流while (1) {
#pragma HLS PIPELINE II=1axis32_t in = s_axis.read(); // blocking read, 会根据 s_axis.tvalid/tready 手握进行ap_uint<32> din = (ap_uint<32>)in.data;acc += din;// 如果是这个包的最后一拍,产生输出(单拍)if (in.last) {axis32_t out;out.data = (ap_uint<32>)acc;out.last = 1;out.keep = 0b111;out.strb = 0b111;m_axis.write(out); // blocking write,会等待下游 tready// 复位累加器,准备下一个 packetacc = 0;}// 若不是最后一拍,则不输出任何数据(只在包尾输出一次)}
}
①头文件
-
ap_int.h:提供精确位宽的整数类型:ap_uint(无符号)、ap_int(有符号)
-
hls_stream.h:提供 hls::stream 类型,在综合成硬件时可映射为 AXI-Stream 端口。带有 read() / write() 成员函数(阻塞),便于在 HLS 代码里以流处理方式写算法
-
ap_axi_sdata.h:提供AXI4-Stream 数据传输协议相关的数据结构和类型,最常用的是
ap_axis<DATA_WIDTH, USER_WIDTH, ID_WIDTH, DEST_WIDTH>
模板结构体
②接口
-
#pragma HLS INTERFACE axis port=…:把函数参数 hls::stream<axis32_t> &s_axis 映射为 AXI-Stream 从端口(slave)/主端口(master)(方向由函数内 read()/write() 决定)。综合器生成对应的 TVALID/TREADY/TDATA/… 信号作为模块顶层接口
-
#pragma HLS INTERFACE ap_ctrl_none port=return:指示该模块不使用 HLS 的标准控制接口(ap_start/ap_done/ap_idle)
③s_axis.read()
-
HLS 会在综合中生成 s_axis_TREADY。read() 返回需要 s_axis_TVALID1 且本端 TREADY1(握手成功)
-
如果上游没有有效数据(TVALID=0),read() 将等待(函数在此处暂停),不会继续执行后面的语句
④m_axis.write(out)
- write() 是阻塞的:会设置 m_axis_TVALID=1 并等待下游 m_axis_TREADY==1。若下游暂时不 ready,则 write() 阻塞(回压),并最终导致本模块暂停接收(s_axis_TREADY 可能被撤回),从而整个 pipeline 暂停读取上游数据。
⑤#pragma HLS PIPELINE II=1
- 目标是把 while 循环流水线化,使得每个时钟都会有一个新迭代开始(发起新的 read())。但 阻塞操作(read() / write() 的等待)会在硬件中引入 stall/bubble,使得实际启动间隔 II > 1。也就是说:II=1 是理想目标,真实吞吐受握手/回压限制。
(3)生成的IP核
(4)仿真
①TestBench
通过AXI4-Stream接口输入0-8
module a();
reg clk;
reg rst;
initial begin
rst = 'd0;
clk = 'd0;
#200;
rst = 'd1;
end
reg s_axis_TVALID;
wire s_axis_TREADY;
reg [31:0] s_axis_TDATA;
reg s_axis_TVALID;
wire s_axis_TLAST;
wire [3:0] s_axis_TKEEP;
wire [3:0] s_axis_TSTRB;
reg [31:0] cnt;
reg [31:0] state;
wire m_axis_TREADY;
wire [31:0] result;
wire m_axis_TVALID;
always #5 clk = ~clk;
assign m_axis_TREADY='d1;
assign s_axis_TKEEP='b1111;
assign s_axis_TSTRB='b1111;
assign s_axis_TLAST=(cnt=='d9);always@(posedge clk)
beginif(!rst)begins_axis_TVALID <='d0;cnt <='d0;s_axis_TDATA <='d0;state <= 'd0;endelse begincase(state)'d0:begins_axis_TVALID <='d1;s_axis_TDATA<=cnt;cnt <=cnt+'d1; if(cnt=='d9)begins_axis_TVALID <='d0; cnt<='d0;end if(m_axis_TVALID)beginstate <='d1;endend'd1:begins_axis_TVALID <='d1;s_axis_TDATA<=cnt;cnt <=cnt+'d1; if(cnt=='d9)begins_axis_TVALID <='d0; cnt='d0;end if(m_axis_TVALID)beginstate <='d2;endendendcaseend
end
axis_accum_0 hello_world (.ap_clk(clk), // input wire ap_clk.ap_rst_n(rst), // input wire ap_rst_n.s_axis_TVALID(s_axis_TVALID), // input wire s_axis_TVALID.s_axis_TREADY(s_axis_TREADY), // output wire s_axis_TREADY.s_axis_TDATA(s_axis_TDATA), // input wire [31 : 0] s_axis_TDATA.s_axis_TLAST(s_axis_TLAST), // input wire [0 : 0] s_axis_TLAST.s_axis_TKEEP(s_axis_TKEEP), // input wire [3 : 0] s_axis_TKEEP.s_axis_TSTRB(s_axis_TSTRB), // input wire [3 : 0] s_axis_TSTRB.m_axis_TVALID(m_axis_TVALID), // output wire m_axis_TVALID.m_axis_TREADY(m_axis_TREADY), // input wire m_axis_TREADY.m_axis_TDATA(result), // output wire [31 : 0] m_axis_TDATA.m_axis_TLAST(), // output wire [0 : 0] m_axis_TLAST.m_axis_TKEEP(), // output wire [3 : 0] m_axis_TKEEP.m_axis_TSTRB() // output wire [3 : 0] m_axis_TSTRB
);endmodule
②仿真结果