0
1
00:00:01,210 --> 00:00:06,710
What is the hardware implementation corresponding to variables in an HLS code? In this lecture,
1

2
00:00:06,970 --> 00:00:11,950
I am going to explain the hardware component corresponding to a variable considering its data type.
2

3
00:00:13,540 --> 00:00:19,570
An HLS tool reads a code, optimises that, and then binds a hardware component to each element 
3

4
00:00:19,570 --> 00:00:20,830
in the optimised code.
4

5
00:00:22,380 --> 00:00:27,930
During the optimisation process, some variables in the original code may be eliminated or merged by
5

6
00:00:27,930 --> 00:00:28,830
other variables.
6

7
00:00:29,250 --> 00:00:33,870
Also, the optimisation process can create new intermediate variables.
7

8
00:00:34,650 --> 00:00:39,540
The survived variables after optimisation should be mapped to real hardware elements.
8

9
00:00:39,930 --> 00:00:44,610
A synthesis tool usually implements a variable by a bunch of wires or memories. 
9

10
00:00:45,000 --> 00:00:50,220
Memories can be flip-flops, BRAMs, DDR, and other types of memories.
10

11
00:00:51,120 --> 00:00:54,990
This implementation depends on the role of the variables in the code. 
11

12
00:00:55,470 --> 00:01:00,440
In this course, we are focusing on combinational circuits which don’t rely on memories. 
12

13
00:01:00,480 --> 00:01:07,320
Therefore, all variables in code should be mapped on a set of wires if they survive through the optimisation
13

14
00:01:07,320 --> 00:01:07,920
process.
14

15
00:01:08,770 --> 00:01:14,250
Throughout this course, I will explain how to write a code to be synthesised to a combinational circuit.
15

16
00:01:16,200 --> 00:01:22,770
As an example, let’s consider this Boolean expression that combines four variables with OR operators. 
16

17
00:01:23,520 --> 00:01:27,600
A C/C++ function in HLS can implement this code. 
17

18
00:01:27,660 --> 00:01:30,710
The function has four input and one output arguments.
18

19
00:01:30,870 --> 00:01:34,170
Then we can describe the expression and return the result.
19

20
00:01:34,920 --> 00:01:37,740
Don’t forget to define hardware port interfaces.
20

21
00:01:38,490 --> 00:01:41,820
Three OR gates can implement this expression. As can be seen, 
21

22
00:01:41,970 --> 00:01:46,320
all variables in the code have been implemented by wires in the hardware. 
22

23
00:01:48,520 --> 00:01:55,450
HLS tools optimise a code before its implementation. This implementation of our Boolean expression 
23

24
00:01:55,450 --> 00:01:57,070
has three levels of gates. 
24

25
00:01:57,310 --> 00:02:02,010
If we assume a delta delay for each gate then its propagation delay is 3delta. 
25

26
00:02:02,860 --> 00:02:06,070
This boolean expression can be optimised for performance, 
26

27
00:02:06,220 --> 00:02:08,920
thanks to the associativity of the OR operator. 
27

28
00:02:09,700 --> 00:02:16,110
The optimised implementation requires two levels of gates which its propagation delay would be 2delta.
28

29
00:02:16,960 --> 00:02:21,240
Let’s have a look at the output of the vivado-HLS synthesising our boolean expression.
29

30
00:02:21,760 --> 00:02:24,970
This picture is the analysis perspective in Vivado-HLS
30

31
00:02:25,060 --> 00:02:29,920
after synthesising our boolean expression.  It consists of 8 operations.
31

32
00:02:30,100 --> 00:02:36,310
The first four operations read the inputs. Then two OR gates perform the OR operators 
32

33
00:02:36,310 --> 00:02:41,920
in parallel. The third OR gate combines the outputs of the first layer gates. 
33

34
00:02:42,610 --> 00:02:45,340
Finally, the last line returns the result. 
34

35
00:02:47,570 --> 00:02:54,140
Variables of floating-point data types also can be synthesised into wires or memories depending 
35

36
00:02:54,140 --> 00:02:57,380
on the involved operators and the design coding style. 
36

37
00:02:58,380 --> 00:03:05,070
Memory cells can be in the form of FFs, BRAMs, DDRs or other types of memories.
37

38
00:03:05,760 --> 00:03:12,150
However, in combinational circuits which is the subject of this course, these variables are synthesised 
38

39
00:03:12,150 --> 00:03:13,170
into wires.
39

40
00:03:14,530 --> 00:03:20,200
This arithmetic expression in which variables are of float data type can be synthesised by Vivado-HLS 
40

41
00:03:20,200 --> 00:03:24,910
into a combinational circuit. Similar to the Boolean expression example, 
41

42
00:03:25,060 --> 00:03:31,410
this C function can describe this mathematical equation. And the synthesised hardware uses three combinational 
42

43
00:03:31,420 --> 00:03:33,100
adders to implement this code.
43

44
00:03:34,030 --> 00:03:37,660
Let’s have a look at the HLS output for this arithmetic expression.
44

45
00:03:38,990 --> 00:03:43,630
I have created a Vivado-HLS project and written the HLS code in the design file. 
45

46
00:03:45,090 --> 00:03:51,270
In order to have a combinational logic, the clock period constraint should be higher than the circuit propagation 
46

47
00:03:51,270 --> 00:03:59,130
delay. For this purpose, go to the Solution Settings and set the clock period constraint to 200ns.
47

48
00:04:00,160 --> 00:04:02,260
Now we are ready to synthesis the code. 
48

49
00:04:03,380 --> 00:04:08,440
After successfully synthesising the design, let’s have a look at the synthesis report.
49

50
00:04:10,220 --> 00:04:15,250
The propagation delay of the circuit is about 88.970 ns
50

51
00:04:16,700 --> 00:04:23,240
The circuit is combinational as it only uses the LUT resources and no memory cell is utilised
51

52
00:04:23,750 --> 00:04:28,910
Also, all ports are simple wires without any specific interfaces.
52

53
00:04:29,900 --> 00:04:31,880
Now we can generate the IP package.
53

54
00:04:33,060 --> 00:04:38,760
Unfortunately, because of the Vivado-HLS GUI limitation to show the Analysis perspective
54

55
00:04:39,120 --> 00:04:46,080
when the design clock period constraint is higher than 99ns, we cannot see the state diagram. But 
55

56
00:04:46,440 --> 00:04:50,820
we can have a look at the vivado RTL schematic to see the design structure.
56

57
00:04:51,790 --> 00:04:58,000
To see the design RTL schematic, first, create a new Vivado project. Then create a block 
57

58
00:04:58,000 --> 00:05:04,720
design. To add our generated IP package, right-click anywhere inside the design area and select the 
58

59
00:05:04,720 --> 00:05:11,830
IP settings… option. Then goto repository and add the Vivado-HLS project folder to the repository 
59

60
00:05:11,830 --> 00:05:12,220
path.
60

61
00:05:13,450 --> 00:05:20,140
Then right-click again anywhere inside the design area and select the Add IP option (or click on the plus 
61

62
00:05:20,140 --> 00:05:27,520
button in the middle). Find the name of the generated IP, which is addition4 and add that to the
62

63
00:05:27,520 --> 00:05:27,940
design.
63

64
00:05:29,690 --> 00:05:33,800
Make the design ports external and then create the HDL wrapper.
64

65
00:05:35,040 --> 00:05:38,820
Now click on schematic option under RTL Analysis.
65

66
00:05:40,970 --> 00:05:44,360
The RTL schematic view will be opened after a couple of seconds.
66

67
00:05:46,630 --> 00:05:52,510
After expanding the design hierarchy, you will see three adders connecting together implementing 
67

68
00:05:52,510 --> 00:05:53,710
our design.
68

69
00:06:02,340 --> 00:06:08,150
Using arbitrary bit-width signals, buses, and data is essential in a hardware design environment. 
69

70
00:06:09,160 --> 00:06:13,950
In the next lecture, I will explain how to use arbitrary precision data types in HLS.
70

71
00:06:16,430 --> 00:06:22,010
These are our takeaway messages. The HLS optimisation process may eliminate variables in a code 
71

72
00:06:22,130 --> 00:06:29,330
or create new intermediate variables.  The remaining variables after optimisation should be mapped to
72

73
00:06:29,330 --> 00:06:30,590
real hardware elements. 
73

74
00:06:30,830 --> 00:06:36,290
A synthesis tool usually implements a variable by a bunch of wires or memories. 
74

75
00:06:38,310 --> 00:06:44,250
Now the quiz question. Find the dataflow graph of the following function after synthesis. 
