Users developing algorithms for FPGAs can benefit from their high-performance parallel architecture. Nevertheless, even if an FPGA has more GFLOPS (109 floating-point operations per second) than a modern computer, some processing is still better suited for a CPU. Assuming that the computer can meet the processing requirements, the ability to use a high-level language like Python or even a low-level language like C can dramatically reduce development time. Furthermore, user interaction and feedback is a lot easier to manage on a computer. The libraries required to develop graphical user interfaces, display signals, images, or video, use peripherals like keyboards, mice, or sound cards, or even save data to a hard drive, are already available.
Because each type of programmable chip (FPGA or processor) has its own pros and the cons, the use of a hybrid configuration is very common, especially for applications like signal processing. FPGA manufacturers have met the needs of clients by offering system on a chip (SoC) designs that integrate FPGA logic and ARM processors within the same integrated circuit.
When the FPGA and processor work together, the FPGA usually does the first layer signal processing. This typically corresponds to high-throughput, low-level data processing like filters, down-samplers, fast Fourier transforms, and modulations. The processor configures the FPGA algorithm and hardware, provides a user interface, and performs upper-layer signal processing once the data is pre-processed by the FPGA.
For example, an FPGA can receive and process data from an FPGA mezzanine card (FMC). Depending on the signal processing result, the FPGA can then generate a message for the host computer. Figure 1 shows a generic system diagram for this kind of application.
Figure 1: Hybrid architecture
To use this type of architecture, a high-bandwidth, low-latency link must exist between the FPGA and the processor. As part of its Advanced Development Platform (ADP) software suite, Nutaq provides a fully integrated Real-Time Data Exchange (RTDEx) pipe between its FPGA platform and a host computer. It includes FPGA cores as well as an application programming interface (API) for C applications to enable fast prototyping and rapid time-market product development.
Figure 2 shows the RTDEx FPGA core and API in the previously described system (the FMC core is also part of the ADP but is outside the scope of this blog post).
Figure 2: RTDEx FPGA core and API
The RTDEx API provides data transfer functions and an abstraction of the physical link. Currently, the RTDEx supports Gigabit Ethernet and PCIe connections between a host computer and a Perseus carrier board. For the Zedboard, the RTDEx supports AXI data transfers between the embedded ARM and the FPGA logic.
To transfer data with the FPGA, the host application initializes the RTDEx by opening the connection and starting the transfer.
Table 1: RTDEx initialization functions
Once an RTDEx pipe has been correctly initialized, it’s possible to transfer data between the host application and the FPGA.
Table 2: RTDEx data transfer functions
These functions use the following arguments: the initialized RTDEX connection, a pointer to the data array, and the number of bytes to transfer. Their use is not significantly harder than writing or reading data to or from a file.
Depending on the physical link used, the different parameters of an RTDEx connection can be modified (e.g. the channel number, the direction, the mode, the frame size, and the transfer size).
Currently, the send and read functions of the RTDEx only stream raw data. The RTDEx implementation does not provide header information of the received or sent data. If the users want to, they can wrap the RTDEx API functions and the FPGA core to add their own custom header information.
RTDEx FPGA cores
Nutaq provides an RTDEx FPGA core for each supported physical link. The cores are responsible for the protocol implementation, flow control with the host, and storing incoming data and outgoing data in first-in first-out (FIFO) memory. All the cores have the same user interface – control, status, and data signals of the receive (Rx) and transmit (Tx) FIFO – as described below.
Table 3: RTDEx FPGA TX core user ports
Table 4: RTDEx FPGA RX core user ports
Each data transfer direction can have its own clock for reading or writing the data. The data bus is 32 bits wide. In Gigabit Ethernet and PCIe implementations, 8 channels are available and the host application can select the required one when opening the RTDEx pipe with the FPGA.
To transmit data, the TxReady flag must be high. This indicates that the FIFO will correctly handle a write operation. If this is the case, TxWe can be set high and the current data on TxData will be written to the transmit FIFO. If TxReady is low, this means that the FIFO is full or almost full and write operations must be avoided. This can occur when the user application does not read the data fast enough or the physical link cannot sustain the required throughput.
For receiving data in the FPGA, RxReady indicates if there is data available in the FIFO. If so, RxRe can be set high to perform a read operation. If the read operation is successful, RxDataValid will be high for the next clock cycle of RxUserClk and the data present on RxData will be valid.
Real-time data transfer between an FPGA and a host computer is critical for applications that require high data rates or low latency. It is important to choose the physical interface carefully. Furthermore, if FPGA cores and an API are already developed and tested, lots of time can be saved, thus giving an advantage to products whose time-to-market requirements are critical.
Now that Nutaq’s external API, including the RTDEx, is open source, users can see its implementation and, if required, modify its code to meet their specific needs.
In the next blogs of the RTDEx series, the implementation of each physical link will be explained as well as the benefits and disadvantage of using them.