软件版本:VIVADO2021.1
操作系统:WIN10 64bit
硬件平台:适用 XILINX A7/K7/Z7/ZU/KU 系列 FPGA
实验平台:米联客-MLK-H3-CZ08-7100开发板
板卡获取平台:https://milianke.tmall.com/
登录“米联客”FPGA社区 http://www.uisrc.com 视频课程、答疑解惑!
1概述 本方案基于XDMAIP搭建FPGA工程,并且以“01WIN系统PCIE驱动编译”中已经编译好的驱动和测试程序为演示demo。
本方案内容作为通用的教程内容,适合XILINX各类支持PCIE通信的板卡。并且米联客在XDMA中使用了自己编写的FDMA控制IP,可以简单方便的完成数据之间的交换。
2系统构架 本系统中演示的关键在于我们编写了一个uixdmairq的IP。该用来配合驱动处理中断,uixdmairq提供了AXI-LITE接口,上位机通过访问user空间地址读写uixdmairq的寄存器。该IP在user_irq_req_i输入的中断位,寄存中断位号,并且输出给XDMAIP,当上位机的驱动响应中断的时候,在中断里面写uixdmairq的寄存器,清除已经处理的中断。
另外本方案中通过AXI-BRAM来演示用户user空间的读写访问测试。
3XMDA概述 Xilinx提供的DMASubsystemforPCIExpressIP是一个高性能,可配置的适用于PCIE2.0,PCIE3.0的SG模式DMA,提供用户可选择的AXI4接口或者AXI4-Stream接口。一般情况下配置成AXI4接口可以加入到系统总线互联,适用于大数据量异步传输,通常情况都会使用到DDR,AXI4-Stream接口适用于低延迟数据流传输。
XDMA是SGDMA,并非BlockDMA,SG模式下,主机会把要传输的数据组成链表的形式,然后将链表首地址通过BAR传送给XDMA,XDMA会根据链表结构首地址依次完成链表所指定的传输任务。
AXI4、AXI4-Stream,必须选择一个,用于数据传输
AXI4-LiteMaster可选,用于实现PCIEBAR地址到AXI4-lite寄存器地址的映射,可以用于读写用户逻辑寄存器。
AXI4-LiteSlave可选,用来将XDMA内部寄存器开放给用户逻辑,用户逻辑可以通过此接口访问XDMA内部寄存器,不会映射到BAR。
AXI4Bypass接口,可选,用来实现PCIE直通用户逻辑访问,可用于低延迟数据传输。
4基于XDMA的PCIEFPGA工程搭建
4.1XDMAIP配置
1:添加XDMAIP核
2:配置XDMA IP
双击XDMA IP进行配置
Mode:配置模式,选择 BASE配置
Lane Width:选择PCIE的通道数量对于MZ7035FC为8个通道,每个开发板支持通道数量不一样,通道数量越多通信速度越快,用户需要根据硬件的实际通达数量选择正确的通道数。
Max Link Speed:选择5.0GT/s 即PCIE2.0,对于ultrascale或者ultrascale+的FPGA可以支持PCIE3.0实现更高速度Reference Clock :100MHZ,参考时钟 100M
DMA Interface Option:接口选择 AXI4 接口
AXI Data Width:128bit,即 AXI4 数据总线宽度为128bit
AXI Clock :250M,即AXI4 接口时钟为250MHZ
DMA Interface option 设置为AXI Memory Mapped方式
PCIE ID 配置,这里选择默认的配置就可以,默认的设备类型是Simple communication controllers
PCIE BAR 配置,这里面的配置比较重要,首先使能 PCIE to AXI Lite Master Interface ,这样可以在主机一侧通过PCIE 来访问用户逻辑侧寄存器或者其他 AXI4-Lite 总线设备映射空间选择 1M,当然用户也可以根据实际需要来自定义大小。
PCIE to AXI Translation:这个设置比较重要,通常情况下,主机侧PCIE BAR 地址与用户逻辑侧地址是不一样的, 这个设置就是进行BAR 地址到AXI 地址的转换,比如主机一侧 BAR 地址为0,IP 里面转换设置为0x44A00000, 则主机访问 BAR 地址 0 转换到AXI LIte 总线地址就是0x44A00000
PCIE to DMA Interface :选择64bit 使能
DMA Bypass 暂时不用
PCIE 中断设置
User Interrupts : 用户中断,XDMA提供16条中断线给用户逻辑,这里面可以配置使用几条中断线。
Legacy Interrupt :XDMA支持Legacy中断,我们这么不选
MSI Capabilities :选择支持MSI中断 ,支持4个中断消息向量
注意:MSI 中断和 MSI-X 中断只能选择一个,否则会报错,如果选择了 MSI 中断,则可以选择 Legacy 中断, 如果选择了 MSI-X 中断,那么 MSI 必须取消选择,同时Legacy 也必须选择None。此 IP 对于7 系列设置有这个问题,如果使用Ultrascale 系列,则可以全部选择
MSI-X Capabilities :不选
Miscellaneous :选Extended Tag Field
Link Status Register :选Enable Slot Clock Configuration
配置DMA 相关内容
Number of DMA Read Channel(H2C)和Number of DMA Write Channel(C2H)通道数,对于PCIE2.0 来说最大只能选择2,也就是 XDMA 可以提供最多四个独立的写通道和四个独立的读通道,独立的通道对于实际应用中有很大的作用,在带宽允许的前提前,一个PCIE 可以实现多种不同的传输功能,并且互不影响。这里我们选择2 Number of Request IDs for Read (Write)channel :这个是每个通道设置允许最大的 outstanding 数量,按照默认即可
4.2完成自动连线
配置完成以后,点击 Run Block Auto,可以看到之前的配置信息,如果有发现和目标配置不一样的,需要手动 修改,点击 OK,完成配置。
配置完成后,VIVADO会自动进行必要的连线
到此为止,XDMA IP 配置就完成了。为了让XDMA和上位机可以密切配和工作,我们还要继续搭建其他部分的功能模块。
4.3基于图形设计的XDMA工程
4.4添加中断测试代码
`timescale 1ns / 1ps
/*******************************MILIANKE*******************************
*Company : MiLianKe Electronic Technology Co., Ltd.
*WebSite:https://www.milianke.com
*TechWeb:https://www.uisrc.com
*tmall-shop:https://milianke.tmall.com
*jd-shop:https://milianke.jd.com
*taobao-shop1: https://milianke.taobao.com
*Create Date: 2021/10/15
*File Name: pcie_top.v
*Description:
*Declaration:
*The reference demo provided by Milianke is only used for learning.
*We cannot ensure that the demo itself is free of bugs, so users
*should be responsible for the technical problems and consequences
*caused by the use of their own products.
*Copyright: Copyright (c) MiLianKe
*All rights reserved.
*Revision: 1.0
*Signal description
*1) _i input
*2) _o output
*3) _n activ low
*4) _dg debug signal
*5) _r delay or register
*6) _s state mechine
*********************************************************************/
module pcie_top
(
output [14:0]DDR3_addr,
output [2:0]DDR3_ba,
output DDR3_cas_n,
output [0:0]DDR3_ck_n,
output [0:0]DDR3_ck_p,
output [0:0]DDR3_cke,
output [0:0]DDR3_cs_n,
output [7:0]DDR3_dm,
inout [63:0]DDR3_dq,
inout [7:0]DDR3_dqs_n,
inout [7:0]DDR3_dqs_p,
output [0:0]DDR3_odt,
output DDR3_ras_n,
output DDR3_reset_n,
output DDR3_we_n,
input sysclk_p,
input sysclk_n,
input [7 :0]pcie_mgt_rxn,
input [7 :0]pcie_mgt_rxp,
output [7 :0]pcie_mgt_txn,
output [7 :0]pcie_mgt_txp,
input [0 :0]pcie_ref_clk_n,
input [0 :0]pcie_ref_clk_p,
input pcie_rst_n
);
wire sysclk;
IBUFDS CLK_U(.I(sysclk_p),.IB(sysclk_n),.O(sysclk));
wire axi_aclk;
wire user_irq_en_o;
reg [21:0]timer_cnt;
reg timer_r1,timer_r2;
reg [1:0]int_p;
reg [3:0]user_irq_req_i;
wire inter = !timer_r2 && timer_r1;
always @(posedge axi_aclk)begin
if(user_irq_en_o == 1'b0)begin
timer_cnt <= 22'd0;
end
else begin
timer_cnt <= timer_cnt + 1'b1;
end
end
always @(posedge axi_aclk)begin
if(user_irq_en_o == 1'b0)begin
timer_r1 <= 1'd0;
timer_r2 <= 1'd0;
end
else begin
timer_r1 <= timer_cnt[20];
timer_r2 <= timer_r1;
end
end
always @(posedge axi_aclk)begin
if(user_irq_en_o == 1'b0)begin
int_p[1:0] <= 4'd0;
user_irq_req_i <= 4'd0;
end
else begin
if(inter) int_p <= int_p + 1'b1;
user_irq_req_i <= 4'd0;
user_irq_req_i[int_p] <= 1'b1;
end
end
pcie_system pcie_system_i(
.DDR3_addr(DDR3_addr),
.DDR3_ba(DDR3_ba),
.DDR3_cas_n(DDR3_cas_n),
.DDR3_ck_n(DDR3_ck_n),
.DDR3_ck_p(DDR3_ck_p),
.DDR3_cke(DDR3_cke),
.DDR3_cs_n(DDR3_cs_n),
.DDR3_dm(DDR3_dm),
.DDR3_dq(DDR3_dq),
.DDR3_dqs_n(DDR3_dqs_n),
.DDR3_dqs_p(DDR3_dqs_p),
.DDR3_odt(DDR3_odt),
.DDR3_ras_n(DDR3_ras_n),
.DDR3_reset_n(DDR3_reset_n),
.DDR3_we_n(DDR3_we_n),
.pcie_mgt_rxn(pcie_mgt_rxn),
.pcie_mgt_rxp(pcie_mgt_rxp),
.pcie_mgt_txn(pcie_mgt_txn),
.pcie_mgt_txp(pcie_mgt_txp),
.pcie_ref_clk_n(pcie_ref_clk_n),
.pcie_ref_clk_p(pcie_ref_clk_p),
.pcie_rst_n(pcie_rst_n),
.axi_aclk(axi_aclk),
.user_irq_en_o(user_irq_en_o),
.user_irq_req_i(user_irq_req_i),
.sysclk(sysclk)
);
endmodule 复制代码
4.5地址分配
进行地址分配:
这里我们把挂在M_AXI上的DDR地址分配从0开始(对于widnows系统必须为0) , M_AXI是需要进行DMA操作的。而M_AXI_LITE挂载的BRAM和uixdmairq中断控制单元是映射到用户BAR地址空间,这个地址就是前面我们XDMA IP里面设置的地址。默认情况下,需要设置uixdmairq中断控制单元的地址和XDMA里面设置的用户BAR地址空间一致。如下图0x44A0_0000。BRAM的地址空间0x44A01_0000。关于地址空间的具体含义,结合软件的使用会更加清晰,初学者暂且根据教程设置。
5硬件安装 注意确保TF卡里面没有程序吗,或者拔掉TF卡。
先下载程序,调试阶段下载bit文件,然后再开电脑。这样才能正确识别和后续测试工作正常开展。
6硬件识别
硬件识别失败自查方法:
7应用程序测试
7.1xdma_rw.exe功能介绍 我们打开一个终端(如果双击运行会很快退出来),进入到上一节编译生成的应用程序目录找到xdma_rw.exe,这个应用程序是操作pcie的所有设备的,我们在终端只输入xdma_rw.exe,可以看到提示信息,告诉用户这个程序如何使用。
1:DEVNODE
control代表控制通道,用于控制XDMA的寄存器,由于精力原因,对于控制通道对XDMA寄存器的设置米联客没有深入研究。
event_*代表中断事件,其中*代表那中断号的中断事件
user代表用户空间,数据走AXI4-LITE接口
h2c_*代表hosttocardPC发送DMA数据到板卡,其中*代表那个通道,一般只用通道0,数据走AXI4-FULL通道
c2h_*代表cardtohost板卡发数据到PC,其中*代表那个通道,一般只用通道0,数据走AXI4-FULL通道
2:ADDR重点知识
读写地址偏移,对于DMA通道地址都是从0开始,但是对于PSDDR内存,必须偏移至少20MB开始读写PSDDR
对于user的读写,偏移地址是axi-lite接口的IP地址,减去在XDMAIP中配置的PCIEtoAXITranslation地址。对于米联客的XDMA方案由于修改了驱动中对于中断的响应,所以PCIEtoAXITranslation必须和默认的uixdmairq地址一致。之后再分区其他AXI-LITE接口外设。
3:OPTION
a设置内存对齐
b打开一个二进制文件
f读或者写文件
l数据长度
v更详细的输出
4:DATA
十进制或者十六进制数,必须用空格间隔
如:
17345168
0x110x220x330x44
7.2DMA批量数据测试 DMA传输是我们用的最多的一种,需要用到h2c或者ch2通道。
可以看到在当前目录有一个datafile4k.bin文件,那就测试一下将这个文件传输到FPGA的DDR或者(MA703FA-35T是BRAM),然后读出来。
在终端输入指令:
xdma_rw.exeh2c_0write0x0000000-b-fdatafile4K.bin-l4096
意思就是使用h2c_0设备以二进制的形式读取文件datafile4k.bin写入到BRAM内存地址0x0000000长度为4096字节。
使用命令
xdma_rw.exec2h_0read0x0000000-b-fdatafile4K_recv.bin-l4096
接下来我们可以使用winhex等软件来检查一下两个文件数据是否一直,经过检查,是一致的则说明传输功能正常。
7.3user通道测试 通过AXI-LITE接口写2个数据到挂在AXI-LITE接口的BRAM中
xdma_rw.exeuserwrite0x100000x110x22
通过AXI-LITE接口读2个数据到挂在AXI-LITE接口的BRAM中
xdma_rw.exeuserread0x10000-l2
7.4event中断测试
1:XDMA中断FPGA部分代码
首先我们要理解下XDMA的中断类型,以及控制时序:
1)、LegacyInterrupts:
对于LegacyInterrupts中断,当user_irq_ack第一次为1的时候usr_irq_req可以清0,当user_irq_ack第二次为1的时候,可以重新设置usr_irq_req发起中断。
在PCI总线里面INTx中断是由四条可选的中断线决定的,这种中断方式是共享式的,所有的pci设备把中断信号在一条中断线上相与,再上报给cpu,cpu收到中断以后再查询具体是哪个设备产生了中断。
在PCIE总线里面已经没有了实体的INTx物理中断线了,PCIE标准使用专门的Message事务包来实现INTx中断,这是为了兼容以前的PCI软件。INTx是共享式的,cpu相应中断后还需要查询具体中断源,效率比较低
2)、MSIInterrupts:
MSI发出usr_irq_req中断请求后,user_irq_ack为1只是说明中断已经北主机接收了,但是不代表已经处理,软件或者驱动层可以去清零usr_irq_req。
MSI中断和MSI-X都是往配置的CPU中断寄存器里进行memory写操作,来产生中断,效率比INTx是共享式高,其中MSI最多支持32个中断向量,而MSI-X最多支持2048个中断向量。
3)、MSI-XInterrupts:
当usr_irq_req中断请求后,只要user_irq_ack为1 就可以清零usr_irq_req,但是没说明扫码时候可以置1,重启下次中断。
经过以上所有中断方式测试,发目前使用Legacy和MSI已经够用,而且相对稳定,上位机驱动通过访问用户bar地址空间和米联客编写的Uixdmairqip-core一起管理接收的中断。
Uixdmairq.v源码
/*******************************MILIANKE*******************************
*Company : MiLianKe Electronic Technology Co., Ltd.
*WebSite:https://www.milianke.com
*TechWeb:https://www.uisrc.com
*tmall-shop:https://milianke.tmall.com
*jd-shop:https://milianke.jd.com
*taobao-shop1: https://milianke.taobao.com
*Create Date : 2022/05/01
*Module Name:uixdmairq
*File Name:uixdmairq.v
*Description :
*The reference demo provided by Milianke is only used for learning.
*We cannot ensure that the demo itself is free of bugs, so users
*should be responsible for the technical problems and consequences
*caused by the use of their own products.
*Copyright: Copyright (c) MiLianKe
*All rights reserved.
*Revision : 1.0
*Signal description
*1) _i input
*2) _o output
*3) _n activ low
*4) _dg debug signal
*5) _r delay or register
*6) _s state mechine
*********************************************************************/
`timescale 1ns / 1ps
module uixdmairq #
(
parameter integer XMDA_REQ_NUM = 8
)
(
// Users to add ports here
input wire [XMDA_REQ_NUM-1 :0] user_irq_req_i,
//input wire [XMDA_REQ_NUM-1 :0] xdma_irq_ack_i,
output wire [XMDA_REQ_NUM-1 :0] xdma_irq_req_o,
output wire user_irq_en_o,
input wire S_AXI_ACLK,
input wire S_AXI_ARESETN,
input wire [3 : 0] S_AXI_AWADDR,
input wire [2 : 0] S_AXI_AWPROT,
input wire S_AXI_AWVALID,
output wire S_AXI_AWREADY,
input wire [31 : 0] S_AXI_WDATA,
input wire [3 : 0] S_AXI_WSTRB,
input wire S_AXI_WVALID,
output wire S_AXI_WREADY,
output wire [1 : 0] S_AXI_BRESP,
output wire S_AXI_BVALID,
input wire S_AXI_BREADY,
input wire [3 : 0] S_AXI_ARADDR,
input wire [2 : 0] S_AXI_ARPROT,
input wire S_AXI_ARVALID,
output wire S_AXI_ARREADY,
output wire [31 : 0] S_AXI_RDATA,
output wire [1 : 0] S_AXI_RRESP,
output wire S_AXI_RVALID,
input wire S_AXI_RREADY
);
reg [XMDA_REQ_NUM -1 :0] user_irq_req;
reg [XMDA_REQ_NUM -1 :0] user_irq_req_r1;
reg [XMDA_REQ_NUM -1 :0] user_irq_req_r2;
reg [XMDA_REQ_NUM -1 :0] user_irq_req_r3;
reg [XMDA_REQ_NUM -1 :0] xdma_irq_ack_r1;
reg [XMDA_REQ_NUM -1 :0] xdma_irq_ack_r2;
reg [XMDA_REQ_NUM -1 :0] xdma_irq_ack_r3;
//reg [XMDA_REQ_NUM -1 :0] xdma_irq_ack;
reg [XMDA_REQ_NUM -1 :0] xdma_irq_req;
// AXI4LITE signals
reg [3:0] axi_awaddr;
reg axi_awready;
reg axi_wready;
reg [1 : 0] axi_bresp;
reg axi_bvalid;
reg [3:0] axi_araddr;
reg axi_arready;
reg [31 : 0] axi_rdata;
reg [1 : 0] axi_rresp;
reg axi_rvalid;
// Example-specific design signals
// local parameter for addressing 32 bit / 64 bit C S AXI_DATA_WIDTH
// ADDR_LSB is used for addressing 32/64 bit registers/memories
// ADDR_LSB = 2 for 32 bits (n downto 2)
// ADDR_LSB = 3 for 64 bits (n downto 3)
localparam integer ADDR_LSB = 2;
localparam integer OPT_MEM_ADDR_BITS = 1;
//----------------------------------------------
//-- Signals for user logic register space example
//------------------------------------------------
//-- Number of Slave Registers 4
reg [31:0] slv_reg0;
reg [31:0] slv_reg1;
wire slv_reg_rden;
wire slv_reg_wren;
reg [31:0] reg_data_out;
integer byte_index;
reg aw_en;
// I/O Connections assignments
assign S_AXI_AWREADY = axi_awready;
assign S_AXI_WREADY = axi_wready;
assign S_AXI_BRESP = axi_bresp;
assign S_AXI_BVALID = axi_bvalid;
assign S_AXI_ARREADY = axi_arready;
assign S_AXI_RDATA = axi_rdata;
assign S_AXI_RRESP = axi_rresp;
assign S_AXI_RVALID = axi_rvalid;
// Implement axi_awready generation
// axi_awready is asserted for one S_AXI_ACLK clock cycle when both
// S_AXI_AWVALID and S_AXI_WVALID are asserted. axi_awready is
// de-asserted when reset is low.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )
begin
axi_awready <= 1 'b0;
aw_en <= 1 'b1;
end
else
begin
if (~axi_awready && S_AXI_AWVALID && S_AXI_WVALID && aw_en)
begin
// slave is ready to accept write address when
// there is a valid write address and write data
// on the write address and data bus. This design
// expects no outstanding transactions.
axi_awready <= 1 'b1;
aw_en <= 1 'b0;
end
else if (S_AXI_BREADY && axi_bvalid)
begin
aw_en <= 1 'b1;
axi_awready <= 1 'b0;
end
else
begin
axi_awready <= 1 'b0;
end
end
end
// Implement axi_awaddr latching
// This process is used to latch the address when both
// S_AXI_AWVALID and S_AXI_WVALID are valid.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )
begin
axi_awaddr <= 0;
end
else
begin
if (~axi_awready && S_AXI_AWVALID && S_AXI_WVALID && aw_en)
begin
// Write Address latching
axi_awaddr <= S_AXI_AWADDR;
end
end
end
// Implement axi_wready generation
// axi_wready is asserted for one S_AXI_ACLK clock cycle when both
// S_AXI_AWVALID and S_AXI_WVALID are asserted. axi_wready is
// de-asserted when reset is low.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )
begin
axi_wready <= 1 'b0;
end
else
begin
if (~axi_wready && S_AXI_WVALID && S_AXI_AWVALID && aw_en )
begin
// slave is ready to accept write data when
// there is a valid write address and write data
// on the write address and data bus. This design
// expects no outstanding transactions.
axi_wready <= 1 'b1;
end
else
begin
axi_wready <= 1 'b0;
end
end
end
assign slv_reg_wren = axi_wready && S_AXI_WVALID && axi_awready && S_AXI_AWVALID;
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )begin
slv_reg0 <= 0;
end
else if(slv_reg_wren)begin
if (axi_awaddr[3:2] == 2 'd0)
slv_reg0[31:0] <= S_AXI_WDATA[31:0];
else if (axi_awaddr[3:2] == 2 'd1)
slv_reg1[31:0] <= S_AXI_WDATA[31:0];
end else begin
slv_reg0 <= 0;
slv_reg1 <= slv_reg1;
end
end
// Implement write response logic generation
// The write response and response valid signals are asserted by the slave
// when axi_wready, S_AXI_WVALID, axi_wready and S_AXI_WVALID are asserted.
// This marks the acceptance of address and indicates the status of
// write transaction.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )
begin
axi_bvalid<=0;
axi_bresp<=2'b0;
end
else
begin
if (axi_awready && S_AXI_AWVALID && ~axi_bvalid && axi_wready && S_AXI_WVALID)
begin
// indicates a valid write response is available
axi_bvalid <= 1 'b1;
axi_bresp <= 2 'b0; // 'OKAY ' response
end // work error responses in future
else
begin
if (S_AXI_BREADY && axi_bvalid)
//check if bready is asserted while bvalid is high)
//(there is a possibility that bready is always asserted high)
begin
axi_bvalid <= 1 'b0;
end
end
end
end
// Implement axi_arready generation
// axi_arready is asserted for one S_AXI_ACLK clock cycle when
// S_AXI_ARVALID is asserted. axi_awready is
// de-asserted when reset (active low) is asserted.
// The read address is also latched when S_AXI_ARVALID is
// asserted. axi_araddr is reset to zero on reset assertion.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )
begin
axi_arready <= 1 'b0;
axi_araddr <= 32 'b0;
end
else
begin
if (~axi_arready && S_AXI_ARVALID)
begin
// indicates that the slave has acceped the valid read address
axi_arready <= 1 'b1;
// Read address latching
axi_araddr <= S_AXI_ARADDR;
end
else
begin
axi_arready <= 1 'b0;
end
end
end
// Implement axi_arvalid generation
// axi_rvalid is asserted for one S_AXI_ACLK clock cycle when both
// S_AXI_ARVALID and axi_arready are asserted. The slave registers
// data are available on the axi_rdata bus at this instance. The
// assertion of axi_rvalid marks the validity of read data on the
// bus and axi_rresp indicates the status of read transaction.axi_rvalid
// is deasserted on reset (active low). axi_rresp and axi_rdata are
// cleared to zero on reset (active low).
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )
begin
axi_rvalid <= 0;
axi_rresp <= 0;
end
else
begin
if (axi_arready && S_AXI_ARVALID && ~axi_rvalid)
begin
// Valid read data is available at the read data bus
axi_rvalid <= 1 'b1;
axi_rresp <= 2 'b0; // 'OKAY ' response
end
else if (axi_rvalid && S_AXI_RREADY)
begin
// Read data is accepted by the master
axi_rvalid <= 1 'b0;
end
end
end
// Implement memory mapped register select and read logic generation
// Slave register read enable is asserted when valid address is available
// and the slave is ready to accept the read address.
assign slv_reg_rden = axi_arready & S_AXI_ARVALID & ~axi_rvalid;
always @(*)
if(axi_awaddr[3:2] == 2 'd0)
reg_data_out[31 : 0] <= slv_reg0;
else if(axi_awaddr[3:2] == 2 'd1)
reg_data_out[31 : 0] <= slv_reg1;
else
reg_data_out <= reg_data_out;
// Output register or memory read data
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1 'b0 )
begin
axi_rdata <= 0;
end
else
begin
// When there is a valid read address (S_AXI_ARVALID) with
// acceptance of read address by the slave (axi_arready),
// output the read dada
if (slv_reg_rden) begin
axi_rdata <= reg_data_out; // register read data
end
end
end
// Add user logic here
reg [4 :0] i;
reg [4 :0] j;
reg [4 :0] k;
assign xdma_irq_req_o = xdma_irq_req;
assign user_irq_en_o = slv_reg1[31];
wire [XMDA_REQ_NUM-1:0] xdma_irq_ack = slv_reg0[XMDA_REQ_NUM-1:0];
always @( posedge S_AXI_ACLK ) begin
if ( S_AXI_ARESETN == 1 'b0 || user_irq_en_o == 1 'b0 )begin
user_irq_req_r1 = 0;
user_irq_req_r2 = 0;
user_irq_req_r3 = 0;
end
else begin
user_irq_req_r1 <= user_irq_req_i;
user_irq_req_r2 <= user_irq_req_r1;
user_irq_req_r3 <= user_irq_req_r2;
end
end
always @( posedge S_AXI_ACLK ) begin
if ( S_AXI_ARESETN == 1 'b0 || user_irq_en_o == 1 'b0)begin
j <= 5 'd0;
user_irq_req <= 0;
end
else begin
for(j = 0; j <=XMDA_REQ_NUM-1; j = j +1 )
user_irq_req[j] <= !user_irq_req_r3[j] & user_irq_req_r2[j];
end
end
always @( posedge S_AXI_ACLK ) begin
if ( S_AXI_ARESETN == 1 'b0 || user_irq_en_o == 1 'b0)begin
i <= 5 'd0;
xdma_irq_req <= 0;
end
else begin
for(i = 0; i <=XMDA_REQ_NUM-1; i = i +1 )begin
if(xdma_irq_ack[i]) begin
xdma_irq_req[i] <= 1 'b0;
end
else if(user_irq_req[i])begin
xdma_irq_req[i] <= 1 'b1;
end
end
end
end
// User logic ends
endmodule 复制代码
wireaxi_aclk;
wireaxi_aresetn;
wireuser_irq_en_o;
reg [21:0]timer_cnt;
always@(posedgeaxi_aclk)begin
if(!axi_aresetn||!user_irq_en_o)begin
timer_cnt<=22'd0;
end
elsebegin
timer_cnt<=timer_cnt+1'b1;
end
end
regtimer_r1,timer_r2;
wireinter=!timer_r2&&timer_r1;
always@(posedgeaxi_aclk)begin
if(!axi_aresetn||!user_irq_en_o)begin
timer_r1<=1'd0;
timer_r2<=1'd0;
end
elsebegin
timer_r1<=timer_cnt[20];
timer_r2<=timer_r1;
end
end
reg[1:0]int_p;
reg[3:0]user_irq_req_i;
always@(posedgeaxi_aclk)begin
if(!axi_aresetn||!user_irq_en_o)begin
int_p[1:0] <=4'd0;
user_irq_req_i<=4'd0;
end
elsebegin
if(inter) int_p<=int_p+1'b1;
user_irq_req_i<=4'd0;
user_irq_req_i[int_p]<=1'b1;
end
end 复制代码
2 :XDMA中断上位机部分代码
实现中断程序的源码intr_event.c:
#include<Windows.h>
#include<assert.h>
#include<stdlib.h>
#include<stdio.h>
#include<strsafe.h>
#include<stdint.h>
#include<SetupAPI.h>
#include<INITGUID.H>
#include<WinIoCtl.h>
//#include<AtlBase.h>
#include<io.h>
#include"xdma_public.h"
#pragmacomment(lib,"setupapi.lib")
#pragmawarning(disable:4996)
BYTE start_en;
HANDLEh_user;
DWORD user_irq_ack[1];
char base_path[MAX_PATH+1]="";
#defineMAX_BYTES_PER_TRANSFER0x800000
staticintverbose_msg(constchar*constfmt,...){
intret=0;
va_listargs;
if(1){
va_start(args,fmt);
ret=vprintf(fmt,args);
va_end(args);
}
returnret;
}
staticBYTE*allocate_buffer(size_tsize,size_talignment){
if(size==0){
size=4;
}
if(alignment==0){
SYSTEM_INFOsys_info;
GetSystemInfo(&sys_info);
alignment=sys_info.dwPageSize;
//printf("alignment=%d\n",alignment);
}
verbose_msg("Allocatinghost-sidebufferofsize%d,alignedto%dbytes\n",size,alignment);
return(BYTE*)_aligned_malloc(size,alignment);
}
staticintget_devices(GUIDguid,char*devpath,size_tlen_devpath){
SP_DEVICE_INTERFACE_DATAdevice_interface;
PSP_DEVICE_INTERFACE_DETAIL_DATAdev_detail;
DWORDindex;
HDEVINFOdevice_info;
wchar_ttmp[256];
device_info=SetupDiGetClassDevs((LPGUID)&guid,NULL,NULL,DIGCF_PRESENT|DIGCF_DEVICEINTERFACE);
if(device_info==INVALID_HANDLE_VALUE){
fprintf(stderr,"GetDevicesINVALID_HANDLE_VALUE\n");
exit(-1);
}
device_interface.cbSize=sizeof(SP_DEVICE_INTERFACE_DATA);
//enumeratethroughdevices
for(index=0;SetupDiEnumDeviceInterfaces(device_info,NULL,&guid,index,&device_interface);++index){
//getrequiredbuffersize
ULONGdetailLength=0;
if(!SetupDiGetDeviceInterfaceDetail(device_info,&device_interface,NULL,0,&detailLength,NULL)&&GetLastError()!=ERROR_INSUFFICIENT_BUFFER){
fprintf(stderr,"SetupDiGetDeviceInterfaceDetail-getlengthfailed\n");
break;
}
//allocatespacefordeviceinterfacedetail
dev_detail=(PSP_DEVICE_INTERFACE_DETAIL_DATA)HeapAlloc(GetProcessHeap(),HEAP_ZERO_MEMORY,detailLength);
if(!dev_detail){
fprintf(stderr,"HeapAllocfailed\n");
break;
}
dev_detail->cbSize=sizeof(SP_DEVICE_INTERFACE_DETAIL_DATA);
//getdeviceinterfacedetail
if(!SetupDiGetDeviceInterfaceDetail(device_info,&device_interface,dev_detail,detailLength,NULL,NULL)){
fprintf(stderr,"SetupDiGetDeviceInterfaceDetail-getdetailfailed\n");
HeapFree(GetProcessHeap(),0,dev_detail);
break;
}
StringCchCopy(tmp,len_devpath,dev_detail->DevicePath);
wcstombs(devpath,tmp,256);
HeapFree(GetProcessHeap(),0,dev_detail);
}
SetupDiDestroyDeviceInfoList(device_info);
returnindex;
}
HANDLEopen_devices(char*device_base_path,char*device_name,DWORDaccessFlags)
{
chardevice_path[MAX_PATH+1]="";
wchar_tdevice_path_w[MAX_PATH+1];
HANDLEh;
//extenddevicepathtoincludetargetdevicenode(xdma_control,xdma_useretc)
verbose_msg("Devicebasepath:%s\n",device_base_path);
strcpy_s(device_path,sizeofdevice_path,device_base_path);
strcat_s(device_path,sizeofdevice_path,device_name);
verbose_msg("Devicenode:%s\n",device_name);
//opendevicefile
mbstowcs(device_path_w,device_path,sizeof(device_path));
h=CreateFile(device_path_w,accessFlags,0,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL);
if(h==INVALID_HANDLE_VALUE)
{
fprintf(stderr,"Erroropeningdevice,win32errorcode:%ld\n",GetLastError());
}
returnh;
}
staticintread_device(HANDLEdevice,longaddress,DWORDsize,BYTE*buffer)
{
DWORDrd_size=0;
unsignedinttransfers;
unsignedinti;
if(INVALID_SET_FILE_POINTER==SetFilePointer(device,address,NULL,FILE_BEGIN)){
fprintf(stderr,"Errorsettingfilepointer,win32errorcode:%ld\n",GetLastError());
return-3;
}
transfers=(unsignedint)(size/MAX_BYTES_PER_TRANSFER);
for(i=0;i<transfers;i++)
{
if(!ReadFile(device,(void*)(buffer+i*MAX_BYTES_PER_TRANSFER),(DWORD)MAX_BYTES_PER_TRANSFER,&rd_size,NULL))
{
return-1;
}
if(rd_size!=MAX_BYTES_PER_TRANSFER)
{
return-2;
}
}
if(!ReadFile(device,(void*)(buffer+i*MAX_BYTES_PER_TRANSFER),(DWORD)(size-i*MAX_BYTES_PER_TRANSFER),&rd_size,NULL))
{
return-1;
}
if(rd_size!=(size-i*MAX_BYTES_PER_TRANSFER))
{
return-2;
}
returnsize;
}
staticintwrite_device(HANDLEdevice,longaddress,DWORDsize,BYTE*buffer)
{
DWORDwr_size=0;
unsignedinttransfers;
unsignedinti;
transfers=(unsignedint)(size/MAX_BYTES_PER_TRANSFER);
//printf("transfers=%d\n",transfers);
if(INVALID_SET_FILE_POINTER==SetFilePointer(device,address,NULL,FILE_BEGIN)){
fprintf(stderr,"Errorsettingfilepointer,win32errorcode:%ld\n",GetLastError());
return-3;
}
for(i=0;i<transfers;i++)
{
if(!WriteFile(device,(void*)(buffer+i*MAX_BYTES_PER_TRANSFER),MAX_BYTES_PER_TRANSFER,&wr_size,NULL))
{
return-1;
}
if(wr_size!=MAX_BYTES_PER_TRANSFER)
{
return-2;
}
}
if(!WriteFile(device,(void*)(buffer+i*MAX_BYTES_PER_TRANSFER),(DWORD)(size-i*MAX_BYTES_PER_TRANSFER),&wr_size,NULL))
{
return-1;
}
if(wr_size!=(size-i*MAX_BYTES_PER_TRANSFER))
{
return-2;
}
returnsize;
}
DWORDWINAPIthread_event0(LPVOIDlpParam)
{
BYTEval0[1]="";
DWORDi=0;
BYTEstatu;
char*device_name1="\\event_0";
HANDLEh_event0=open_devices(base_path,device_name1,GENERIC_READ);
while(1)
{
if(start_en){
read_device(h_event0,0,1,val0);//waiteirq
Sleep(1);
if(val0[0]==1)
printf("event_0done!\n");
else
printf("event_0timeout!\n");
i++;
}
}
CloseHandle(h_event0);
return0;
}
DWORDWINAPIthread_event1(LPVOIDlpParam)
{
BYTEval0[1]="";
DWORDi=0;
BYTEstatu;
char*device_name1="\\event_1";
HANDLEh_event1=open_devices(base_path,device_name1,GENERIC_READ);
while(1)
{
if(start_en){
read_device(h_event1,0,1,val0);//waiteirq
Sleep(1);
if(val0[0]==1)
printf("event_1done!\n");
else
printf("event_1timeout!\n");
i++;
}
}
CloseHandle(h_event1);
return0;
}
DWORDWINAPIthread_event2(LPVOIDlpParam)
{
BYTEval0[1]="";
DWORDi=0;
BYTEstatu;
char*device_name1="\\event_2";
HANDLEh_event2=open_devices(base_path,device_name1,GENERIC_READ);
while(1)
{
if(start_en){
read_device(h_event2,0,1,val0);//waiteirq
Sleep(1);
if(val0[0]==1)
printf("event_2done!\n");
else
printf("event_2timeout!\n");
i++;
}
}
CloseHandle(h_event2);
return0;
}
DWORDWINAPIthread_event3(LPVOIDlpParam)
{
BYTEval0[1]="";
DWORDi=0;
BYTEstatu;
char*device_name1="\\event_3";
HANDLEh_event3=open_devices(base_path,device_name1,GENERIC_READ);
while(1)
{
if(start_en){
read_device(h_event3,0,1,val0);//waiteirq
Sleep(1);
if(val0[0]==1)
printf("event_3done!\n");
else
printf("event_3timeout!\n");
i++;
}
}
CloseHandle(h_event3);
return0;
}
int__cdeclmain(intargc,char*argv[])
{
HANDLEh_event0;
HANDLEh_event1;
HANDLEh_event2;
HANDLEh_event3;
HANDLEh_event4;
HANDLEh_event5;
HANDLEh_event6;
HANDLEh_event7;
char*device_name="\\user";
DWORDnum_devices=get_devices(GUID_DEVINTERFACE_XDMA,base_path,sizeof(base_path));
verbose_msg("Devicesfound:%d\n",num_devices);
if(num_devices<1)
{
printf("error\n");
}
h_user=open_devices(base_path,device_name,GENERIC_READ|GENERIC_WRITE);
if(h_user==INVALID_HANDLE_VALUE)
{
fprintf(stderr,"Erroropeningdevice,win32errorcode:%ld\n",GetLastError());
}
h_event0=CreateThread(NULL,0,thread_event0,NULL,0,NULL);
h_event1=CreateThread(NULL,0,thread_event1,NULL,0,NULL);
h_event2=CreateThread(NULL,0,thread_event2,NULL,0,NULL);
h_event3=CreateThread(NULL,0,thread_event3,NULL,0,NULL);
user_irq_ack[0]=0xffff0000;
write_device(h_user,0x00004,4,(BYTE*)user_irq_ack);//startirq
start_en=1;
printf("start\n");
WaitForSingleObject(h_event0,INFINITE);
WaitForSingleObject(h_event1,INFINITE);
WaitForSingleObject(h_event2,INFINITE);
WaitForSingleObject(h_event3,INFINITE);
user_irq_ack[0]=0x00000000;
write_device(h_user,0x00004,4,(BYTE*)user_irq_ack);//stop irq
CloseHandle(h_user);
CloseHandle(h_event0);
CloseHandle(h_event1);
CloseHandle(h_event2);
CloseHandle(h_event3);
return0;
} 复制代码
以下指令启动中断
user_irq_ack[0]=0xffff0000;
write_device(h_user,0x00004,4,(BYTE*)user_irq_ack);
以下指令关闭中断
user_irq_ack[0]=0x00000000;
write_device(h_user,0x00004,4,(BYTE*)user_irq_ack);
以上程序设置了4个中断事件,每个事件开启了一个线程,当中断等待的时候线程是挂起的,当中断产生后,继续执行线程。对于XDMA最大可以支持32个中断
3 :XDMA中断测试
执行xdma_event.exe程序
可以看到运行结果是4个中断事件,实际上XMDA最大支持16个中断事件。更多的中断时间可以更好的发挥CPU多核多线程的性能。
看FPGA抓的波形信号
好了,到此XDMA的PCIE方案核心内容就已经讲完了。XILINX官方给的资料往往没有细化,我们米联客已经对以上驱动稍加修改,以更好地支持中断。