In this series:

- Xilinx System Generator tips and tricks – Part 1: An introduction
- Xilinx System Generator tips and tricks – Part 2: HDL code reusability
- Xilinx System Generator tips and tricks – Part 3: Using MATLAB M-function for easy state machine coding
- Xilinx System Generator tips and tricks – Part 4: Understanding timing issues
- Xilinx System Generator tips and tricks – Part 6: Timing issues outside the box.

In the previous post we introduced a concern shared by all digital circuit designers: timing issues. Roughly speaking, failing timing paths tell us that we are trying to do too much logic within a clock period. Most of those issues cannot be ignored. So let’s see how we can address and fix those issues when they happen within System Generator models.

**System Generator reference example**

To demonstrate the different techniques for solving timing issues, we made a simple educational example, shown in Figure 1 (for an introduction to System Generator, see the first blog of this series

**Figure 1: System Generator reference example**

This model essentially measures the power of a complex input signal over 1000 samples. The implemented equation is provided here for reference (with N = 1000):

The first processing stage is a *down-sampling by 2* filter. This filtering enables the signal to be limited to the wanted bandwidth and down-samples the filtered signal by two.

The second processing stage is the power measurement itself. The accumulator is first reset to zero at the start each valid signal (rising edge). Then, for a period of time determined by the valid signal, the square of the samples are accumulated. Finally, the result is divided by 1000 (multiplied by 1/1000) to provide the power measurement. A valid output signal is generated when the power has been calculated.

## Sample time

Our model highlights an important aspect of System Generator (more specifically, of MATLAB Simulink): the sample time. For our convenience, let’s define the sample time of a block as the frequency at which the block will run. Please refer to [2] for more information on the sample time. In our example, the logic following the down-sampling by 2 (in green) runs two times slower than the logic before the down-sampling (in red).

We previously stated that the timing constraints come from the fact that the propagation delay must be smaller than the clock period. This is true, unless you tell the synthesis tool otherwise. When a different sample time is used, the constraints are replaced by multi-cycle path ones. In our example, a multi-cycle path constraint of two cycles will be generated for the green logic. This is very important to take note of since logic with longer sample times has more time to complete and can then more easily meet the timing requirements.

## Identify the failing paths

The first step to resolve timing issues is to identify the failing paths. Figure 2 presents a piece of the timing summary for the first failing path (i.e. the path with the longest delay).

**Figure 2: Timing constraint first failing path**

Some observations can be made from this report:

- The failing requirement is 10 ns (2*5 ns), so the multi-cycle path constraint of two is correctly applied.
- The propagation delay (data path delay) is 16.841 ns and is composed of 22 logic elements.
- The failing path is identified from its source to its destination. It is from the FIR output to the register labelled
*Delay2*. - The clock path skew and the clock uncertainty are negligible.

## Add registers to cut the critical paths

Most of the timing issues can be resolved with a simple trick: adding registers (delay blocks) to cut the critical paths (longest paths). This is simple and sufficient most of the time. In the previous blog post [3], we described critical paths in terms of their basic elements. Each logic block brings its own delay and there is an additional delay to reach the next logic block. We have 22 successive logic elements in our example! By adding registers we can reduce significantly this number. It is also a good design practice to register all the input and output signals for each module. This limits potentially long processing chains when connecting different modules together.

Figure 3 shows the new design with the delay insertions. Note that additional delays must also be added on the non critical paths in order to match the delays. You can observe the matching delays on the valid signal.

**Figure 3: Model with inserted delays **

After rebuilding our model with the new design, we now get the following timing errors:

**Figure 4: Timing errors**

Most of the previously failing paths have been resolved, but there are still some remaining where the longest path goes from *Delay7* to *Delay10 *through the multiplications.

## Set Xilinx block optimization options

Adding registers to cut the critical paths works the same way for most of the Xilinx blocks. While most of the Xilinx blocks have at least a delay parameter, some have more sophisticated options for timing. For example, you can choose either *speed* or *area* for the multiplication optimization parameter. The resolution of the operations also plays a significant role as it directly affects the logic (and often the required number of consecutive logic blocks).

In our example, we added a delay of two within the multiplications and a delay of one for the constant multiplication. Now, we no longer get timing issues.

**Figure 5: Model with delays within the multiplications**

## Resolving timing issues summary

You can resolve most System Generator timing issues by following two simple rules:

- Add registers at the inputs and outputs of all modules and where the data path performs too much consecutive logic.
- Set the Xilinx block timing options.

Unfortunately, sometimes these tricks are not enough. In our next blog post, we’ll look at some more advanced tricks.