﻿ Digital Circuits and Systems - Circuits i Sistemes Digitals (CSD) - EETAC - UPC
 Bachelor's Degree in Telecommunications Systems and in Network Engineering
 Laboratory Lab 4: propagation delay and speed measurements [P4] Analysis of circuit's propagation delay using timing analyser and gate-level simulations [30/3]
 Individual post lab assignment PLA4 assignment to be discussed next Lab5. Study and execute this lab tutorial before attempting to solve the post lab assignment.

1.5.5.1. Gate-level simulation: propagation delay measurement

How to measure circuit's propagation time for a given target chip: post-synthesis model in VHDL (VHO file and its associated SDO/SDF delay file) and gate-level VHDL simulation.

Measuring the worst-case scenario: longest propagation delay (tP).

1.5.5.3. Calculating circuit's maximum speed (fMAX).

In the lab we have some commercial CPLD and FPGA target where to synthesise our circuits. For instance:

 CPLD FPGA Xilinx XC2C256-TQ144 - 7 Spartan-3E XC3S500E-FG320 Intel MAX II EPM2210F324C3 Cyclone IV EP4CE115F29C7 Lattice ispMach4128V TQFP100 MachXO

NOTE: Quartus Prime do not generate delay files (sdo) for Intel MAX 10 devices.

1. Specifications

Let us continue Adder_1bit based on MoM from Lab3 measuring propagation time of signals in a given transition and maximum speed of computing. Two new tools will be presented:

• (1) ModelSim gate-level simulation.

• (2) Quartus Prime timing analyser

The objective will be measure what target chip is faster when implementing the same project Adder_1bit:

• (A) MAXII device EPM2210F324C3

• (B) Cyclone IV EP4CE115F29C7

Firstly, we will solve the project for chip (A) and annotate results. Secondly, keeping the same project, we will change the target chip to (B) and annotate results. We will discuss solutions.

 2. Planning 3. Development 4. Test: functional simulation

Start a new project copying source files from to this new location:

 Fig. 1. The plan proposed for this project is the same from .

Select the (A) MAX II target device and synthesise the circuit.

 Fig. 2. Plan C2 source files.

Check the RTL and technology view as you did in Lab3 Fig. 6 and Fig. 7

Check that you've got the same functional test results as in  Lab3  and Fig. 10  when using the same VHDL testbench fixture Adder_1bit_tb.vhd.

5. Test: gate-level simulation - timing analyser

Set the following project parameters before re-synthesise your project and be able to generate the necessary VHO and SDO files for the target chip:

 Fig. 4. Parameters for letting Quartus Prime generate delay files.

And now re-synthesise your project to generate in Quartus Prime the VHDL translation of the technology circuit (Adder_1bit.vho) and its delay file (Adder_1bit_vhd.sdo).

 Fig. 5. Indications for generating the VHDL technology circuit translation (vho) along with its delay file (sdo).

Now you are ready for starting a ModelSim gate-level simulation for this flat circuit. This is our tutorial on simulation and the timing analyser tool in case you needed it.

 Fig. 6. Create a new ModelSim project.

Add the same testbench that you also have copied from Lab3 to the new location:

 Fig. 7. Add the same testbench and  flat structure

Compile all and check the project's integrity. Therefore, hierarchical structures formed by multiple VHDL files are replaced by a single flat VHO file to be tested using the same testbench.

 Fig. 8. Check the project integrity.

Start a new simulation paying attention this time to both "Design" and "SDF" tabs:

 Fig. 9.

Attach the standard delay file to the region of interest:

 Fig. 10. The region where to apply the SDO file is the instance i1 (the unit-under-test).

Run and check that the full wave is the same as it was in functional simulations.

 Fig. 11. Full view of the simulation results.

Zoom at a given signal transition to measure propagation delays using two cursors.

 Fig. 12. In this example transition, both outputs changes from "10" to "01" after 6.17 ns.

Go back to Quartus Prime to find the largest propagation delay using the timing analyser spreadsheet tool.

 Fig. 13. Starting processing the timing analyser tool.

View results:

 Fig. 14. View results using the timing analyser tool.

 Fig. 15. Spreadsheet from datasheet report.

And so, calculate the maximum frequency of operation for this specific target chip MAX II:

And now, you can redo the project (at the same location), changing the target chip and compare how this Addder_1bit is performing for a Cyclone IV target chip.

 Fig. 16. Select a Cyclone IV device (Field Programmable Gate Array).

Check that the files of interest are correctly generated for this new target chip. Indeed here you can find several simulation models (fast, slow, etc.):

 Fig. 17. For this chip several delay models are generated.

Recompile your ModelSim project and write down results:

 Fig. 18. Cyclone IV measurements are slightly different at the same transition.

And, using the timing analyser, we can observe a longer propagation delay than in MAX II.

 Fig. 19. Cyclone IV measurements are slightly different at the same transition.

And thus, the maximum frequency of operation for this Cyclone IV is slower that MAXII.

 Laboratory Lab 4: circuit optimisation [P4] Adder_16bit RC: comparing ripple-carry and carry-lookahead architectures [30/3]

In the project above, you have seen how is performing the same design in two target chips.

Now we have in mind solving two Adder_16bit architectures for the same chip. Thus, learning about basic concepts on circuit optimisation. You will observe what is the difference between ripple carry and carry-look ahead adders, which one is faster?

1. Specifications

Design an Adder_16bit using ripple-carry technique.

 Fig. 1. Symbol and example waveforms.

Here we rely on the work done when designing the tutorial Adder_4bit ripple-carry. Zero (Z) flag will not be implemented, it simply adds another level of gates and is not necessary for the purpose of comparing with the Adder_16bit CLA presented below.

2. Planning

Fig 2 shows the architecture. Remember that we already have designed Adder_8bit component in Lab3 using the same carry chain strategy.

Project location:

3. Development

Let us pick up a MAX II EPM2210F324C3 target chip.

VHDL file translation of the architecture in Fig. 2: Adder_16bit.vhd.

Components files Adder_8bit.vhd , Adder_4bit.vhd and Adder_1bit.vhd (plan A or plan B) can be found in Lab3.

 Fig. 3. RTL view and all the files requires in this design.

Technology view and project summary shows that only 32 logic elements are used for synthesising this project, saving much hardware with respect the CLA implementation below proposed in the next project.

 Fig. 4.Technology view. Each Adder_1bit is implemented in two logic elements.

4. Test: functional simulation

This is the translation of the testbench fixture and some signal activity proposed in Fig. 1: Adder_16bit_tb.vhd. It can be used for both, functional and gate-level simulations.

 Fig. 5. VHDL testbench fixture. The UUT is described as a hierarchical VHDL project when performing a functional simulation, and as a flat technology circuit when solving the gate-level simulation.

Functional results must be identical for both projects of the same entity, ripple-carry and carry-lookahead Adder_16bit.

 Fig. 6. Functional simulation results.

5. Gate-level simulation

Running a gate-level simulation for this ripple-carry, for instance, at the transition highlighted in Fig.1 where all bits have to change, we obtain an accumulated propagation delay of 22.6 ns, practically doubling the one produced by the CLA adder.

 Fig. 7. Gate-level simulation results at a particular transition.

When running the timing analyser, the maximum delay occurs when driving A(0) and waiting results at Cout. This is tP= 22.7 ns, allowing a maximum frequency of operations of 22 Mops. (unsigned radix-2 16-bit millions operations per second).

Furthermore, at this point the answer to the question: which is the minimum value for Min_Pulse for experimentation in the laboratory becomes straightforward: Min_Pulse > tP. Min_Pulse smaller that the propagation delay of the circuit will imply that the circuit is going to be switching continuously, never being able to reach stable output values. We can verify that using this set of vectors Adder_16bit_tb.vhd as represented in Fig. 9 (Min_Pulse = 25 ns) and Fig. 10  (Min_Pulse = 20 ns).

 Fig. 9. Min_Pulse is 25 ns, practically on the limit.

 Fig. 10. Min_Pulse is 20 ns < tP. Thus, the outputs never settles to a valid result.

It is time to compare results with the new carry-lookahead design proposed in the next project.

6. Reporting

 Laboratory Lab 4:circuit optimisation [P4] Adder_16bit CLA: comparing ripple-carry and carry-lookahead architectures [30/3]

In the project above, you have seen how is performing the same design in two target chips.

Now we have in mind solving two Adder_16bit architectures for the same chip. Thus, learning about basic concepts on circuit optimisation. You will observe what is the difference between ripple carry and carry-look ahead adders, which one is faster?

1. Specifications

Design an Adder_16bit using carry-lookahead (CLA) technique. Module propagator and generator are outputs for chaining larger adders using the same CLA architecture. truth table and timing diagram are the same stated in Adder_4bit CLA considering data range up to 65535 = "1111111111111111".

 Fig. 1. Symbol and waveforms.

Here we rely on the work done when designing the tutorial Adder_4bit CLA. For instance, read in Wikipedia how an Adder_16bit works and how it is possible to calculate all carries before hand. This reference also explains how to chain carry generators: Ercegovac, M., Lang, T., Moreno, J. H., "Introduction to Digital Systems", John Wiley & Sons, 1999). It includes slides: Chapter 10 is on arithmetic circuits.

2. Planning

Fig 2 shows the picture from Wikipedia that is adapted in CSD as Fig 3.

 Fig. 2. Adder_16bit architecture proposed in Wipikedia to be written in CSD as VHDL file.

The 16-bit Lookahead carry Unit is the same circuit designed in Adder_4bit, as shown in Fig. 3.

 Fig. 3. Planning the Adder_16bit. How many gate-levels are expected in this circuit?

Project location:

3. Development

We will pick up the same Intel MAX II EPM2210F324C3 target chip. This is the same chip implementing the same entity but with two alternative architectures.

VHDL file translation of the architecture in Fig. 3: Adder_16bit.vhd.

Start a VHDL synthesis project and observe the schematics. This time the architecture is far more complex, compromising up to 71 logic elements; more chip logic resources with the aim of obtaining a faster circuit.

 Fig. 4. RTL view.

The architecture uses up to five Carry_generators organised in two levels (one carry_generator in each Adder_4bit module, and another one in the top entity). However, the largest number of logic gates levels is only six.

 Fig. 5. Technology view.

Chip planner tool allows us to map exactly where all the resources in use are located.

 Fig. 6. MAX II chip occupation from chip planner tool.

4. Test: functional simulation

Now, it is time to test the synthesised circuit above. Thus, we can use the same VHDL fixture represented in the Fig. 5 of the previous project Adder_16bit RC.

Example of testbench Adder_16bit_tb.vhd from which to copy the stimulus signals. We are expecting the same ideal functional results represented in Fig. 7 when applying the same input vectors.

 Fig. 7. Functional results.

5. Gate-level simulation

Now, when running a gate-level simulation at the same transition, is when a shorter propagation delay is obtained. Only 12.6 ns (saving 10 ns from the same design above based on ripple-carry).

 Fig. 8. Gate-level simulation.

And, finally, measuring the longest propagation delay using the timer analyser tool, we can deduce the maximum circuit's speed. The longest signal path is established between input A(6) and output S(11), tP =  14.3 ns. This circuit is capable of performing 35 Mops. Thus, an improved architecture using more hardware (logic elements) generates a faster circuit. The drawback is that the dynamic power consumption will be higher.

6. Reporting