Nowadays, FPGAs are becoming a more and more attractive choice for power-hungry signal processing applications because they’re able to process a huge amount of data in parallel. Modern processors that have similar or even higher clock speeds than FPGA (in terms of GHz clock rate) still having less processing power than FPGAs. Let’s look at an example of 8-tap discrete finite impulse response (FIR) filtering, which can be expressed as:

Direct implementation of a 8-Tap FIR filter

Figure 1: Direct implementation of an 8-tap FIR Filter


FPGAs can easily perform eight parallel operations to provide results within one clock cycle using the parallel filtering architecture as shown in Figure 1. In contrast, computer processors require more than one clock cycle to provide the same result, as illustrated in the following pseudo-code:

 This is due to the fact that the a processor needs several clock cycles to fetch, decode, and execute a DSP instruction, as opposed to an FPGA, which uses parallel processing

[1]. Therefore, repetitive, power-hungry signal processing tasks such as filtering, Fourier transforms, correlation, matrix processing, and so on, are very well suited to FPGAs, while less demanding signal processing and sequential processing tasks are well suited to processors. Furthermore, since floating point, unpredictable, and dynamic parallel processing tasks can’t leverage the repetitive processing advantage of FPGAs, they’re easier to perform in a processor than in a complex implementation in an FPGA.

 The embedded wireless system designer must carefully balance processing tasks between FPGAs and embedded processors to optimize the tradeoff between the implementation cost and the performance of the system. The latest system-on-chip (SoC) Zynq® family from Xilinx® combines a two-core ARM® Cortex™ A9 processor rated up to 1 GHz and a high-performance FPGA device rated up to 2622 GMACs processing power [2]. The advanced microcontroller bus architecture (AMBA) [3] interconnects these components to allow high performance data exchange between the processors and the FPGA device. The beauty of this architecture is that the processor only needs to provide the correct data and instructions to the FPGA. After that, the processor does another task while the FPGA does its job. When the job is done, the FPGA passes the results to the processor, which provides more data to the FPGA.

 An example wireless system can be implemented as shown in the following figure. The physical layer (Layer 1) is implemented in the FPGA device and performs all related signal processing in the transmission and reception chains, such as modulation and demodulation, channelization, filtering, synchronization, channel encoding and decoding, and so on [4]. Layer 1 exchanges information with the data link layer, which is implemented in processors (Layer 2) via the AMBA interconnection. Error control in Layer 2 is hardware accelerated by error correction control logic with direct memory access (DMA) in the FPGA, freeing up the processor for another task. The network layer (Layer 3) is also implemented in the processor and exchanges information between the system and the outside world using various protocols such as TCP/IP, IPX, AppleTalk, and so on [4].

 Processing tasks - Zynq device

Figure 2: Processing Tasks – Zynq device




By using a combination FPGA/embedded-processor architecture, the embedded system designer has greater flexibility in distributing processing tasks between the FPGA and the processor to solve any particular problem in wireless applications. A wide range of choices in FPGAs and processors can be found in Xilinx’s SoC Zynq family to solve problems from simple to complex.


[1] ARM. 2012. “Multiplication Instructions.” Cortex-A9 Technical Reference Manual.

[2] Xilinx. 2013. Zynq™-7000 All Programmable SoCs.

[3] ARM. 2011. “Advanced Microcontroller Bus Architecture.” CoreSight ETM R4 Technical Reference Manual.

[4] Wikipedia. Last modified on 19 May 2013. “Open Systems Interconnection Model.” http:/