1
00:00:00,210 --> 00:00:06,930
Hello, everyone, and welcome to this new and exciting session in which we are going to look at post

2
00:00:06,930 --> 00:00:10,140
training quantization with TensorFlow.

3
00:00:10,800 --> 00:00:17,790
In the previous session we looked at quantization, our training still with TensorFlow and now we'll

4
00:00:17,790 --> 00:00:23,460
look at how to do quantization for a model which has already been trained.

5
00:00:23,820 --> 00:00:31,950
Now, as you could see here, we have this Pre-trained model which obtains an accuracy of 84% and a

6
00:00:31,950 --> 00:00:37,950
top five accuracy of or at a top to accuracy of 95.6%.

7
00:00:38,370 --> 00:00:41,940
And then in the section, we'll quantile this model.

8
00:00:41,950 --> 00:00:49,740
Check out on whether this quantized model occupies less space as compared to the original model, and

9
00:00:49,740 --> 00:00:55,890
then also verify that not much model performance is lost.

10
00:00:56,130 --> 00:01:02,790
Before we start with the quantization process, we should note that we are going to be using this TensorFlow

11
00:01:02,790 --> 00:01:04,280
Lite library.

12
00:01:04,290 --> 00:01:13,230
Now, this TensorFlow Lite library is a mobile library for deploying models on mobile devices, microcontrollers

13
00:01:13,230 --> 00:01:15,270
and other edge devices.

14
00:01:15,270 --> 00:01:21,870
So this means that in this environment where the compute resources are limited, it's important for

15
00:01:21,870 --> 00:01:31,590
us to quantized the models since we will now get smaller and lighter models and also faster models.

16
00:01:31,650 --> 00:01:34,350
Here we have a general overview of how this works.

17
00:01:34,350 --> 00:01:38,540
You will see you pick a model like, for example, efficient net based model.

18
00:01:38,550 --> 00:01:43,680
You convert this model to TensorFlow Lite using TensorFlow lite converter, which we are going to see

19
00:01:43,680 --> 00:01:44,360
shortly.

20
00:01:44,370 --> 00:01:51,180
We then deploy this by taking a compressed version or compressed TF lite file.

21
00:01:51,180 --> 00:01:57,680
Now we already working with, for example, Keras files, which are HF five files.

22
00:01:57,690 --> 00:02:07,170
Now this TF lite is some sort of compressed file, and this compressed file will be loaded in the environment

23
00:02:07,170 --> 00:02:08,520
in which you'll be working.

24
00:02:08,790 --> 00:02:18,750
And then from here we'll also quantized this from 32 floats to eight bit integers which can run on devices

25
00:02:18,750 --> 00:02:21,030
with low compute resources.

26
00:02:21,030 --> 00:02:27,120
So you earn the documentation we have here t of the light and you have tf light converter, basically

27
00:02:27,120 --> 00:02:28,620
the Silverlight converter.

28
00:02:28,620 --> 00:02:35,940
As the word goes, it's going to convert your models into the TF lite format and yours.

29
00:02:35,940 --> 00:02:37,200
You could see these examples.

30
00:02:37,200 --> 00:02:47,090
You have this tf like converter from a saved model, from a Keras model, from a function, from a model.

31
00:02:47,100 --> 00:02:51,090
So let's say, for example, we work with the Keras model.

32
00:02:51,090 --> 00:03:00,180
We just have the model past year and then you generate this TF lite model from this model making use

33
00:03:00,180 --> 00:03:01,830
of the S.F. light converter.

34
00:03:01,860 --> 00:03:06,810
Now, apart from this arguments, we also have the attributes.

35
00:03:06,810 --> 00:03:13,500
So you could specify the optimizations the representative data set, which is very important in the

36
00:03:13,500 --> 00:03:16,530
case where we're working with static quantization.

37
00:03:16,530 --> 00:03:18,690
Remember that we study quantization.

38
00:03:18,690 --> 00:03:28,220
We have to obtain the skill and zero point values by making use of unlabeled data.

39
00:03:28,230 --> 00:03:33,390
So basically, as we had seen previously, all we need to do is to pass in the inputs, which in this

40
00:03:33,390 --> 00:03:35,490
case are images.

41
00:03:35,490 --> 00:03:42,390
And then this values will be inferred from the model's interaction with the inputs.

42
00:03:42,390 --> 00:03:49,140
Then we have the target specifications, inference, input type, inference, output type, whether

43
00:03:49,140 --> 00:03:55,470
to allow custom operations or not, and then whether to exclude the conversion method data or not.

44
00:03:55,470 --> 00:03:56,280
So that's it.

45
00:03:56,280 --> 00:03:59,250
That's our converter right here.

46
00:03:59,280 --> 00:04:06,840
Now we pass out this code from the documentation, and then let's let's take this first year now apply

47
00:04:06,840 --> 00:04:13,080
some of the attributes that we have this converter dot optimizations.

48
00:04:13,350 --> 00:04:21,930
Let's get back to the documentation, the optimizations and here we could set this optimization with

49
00:04:21,930 --> 00:04:23,790
this TF light optimize.

50
00:04:23,790 --> 00:04:25,380
So let's open this up.

51
00:04:25,950 --> 00:04:27,090
There we go.

52
00:04:27,360 --> 00:04:29,160
You see, this takes different values.

53
00:04:29,160 --> 00:04:30,480
We have default.

54
00:04:30,480 --> 00:04:36,120
What I want to optimize for size, this is deprecated does the same as default, optimized for latency

55
00:04:36,120 --> 00:04:38,220
does the same as default experiment.

56
00:04:38,220 --> 00:04:42,120
This one is experimental, hence subject to change.

57
00:04:42,120 --> 00:04:44,940
So what we're going to do here is simply take this default.

58
00:04:44,940 --> 00:04:59,610
So that said, we have the t f the light to f the light, the optimize and default.

59
00:04:59,830 --> 00:05:03,490
Just as we had in the documentation right here.

60
00:05:04,690 --> 00:05:08,920
Now, we could also specify the inference input type and the inference output type.

61
00:05:08,920 --> 00:05:10,570
So let's get back here.

62
00:05:10,570 --> 00:05:11,800
We have.

63
00:05:12,910 --> 00:05:13,690
There we go.

64
00:05:13,690 --> 00:05:27,760
Converter, uh, inference input type is going to be unsigned int it and then let's copy paste this.

65
00:05:27,760 --> 00:05:30,850
We have the output type which is going to be the same.

66
00:05:30,970 --> 00:05:32,290
So that's it.

67
00:05:32,320 --> 00:05:36,460
So we will specify this, the inference input type and the inference output type.

68
00:05:36,460 --> 00:05:41,620
And now let's specify the representational data.

69
00:05:41,620 --> 00:05:51,610
So this representative dataset right here, which in fact is a generator which permits us output the

70
00:05:51,610 --> 00:05:58,090
input values because recall, we all we need in the static quantization is just this input.

71
00:05:58,090 --> 00:06:08,590
So here we have the generator which yields the inputs and then here we just say converter dot representative

72
00:06:09,840 --> 00:06:12,790
data set equals a representative data generator.

73
00:06:13,630 --> 00:06:20,080
Yeah, we have our training data set Now we could take we could take all our training data set or just

74
00:06:20,080 --> 00:06:20,890
a few.

75
00:06:22,240 --> 00:06:28,000
We obviously don't need to take all the data set so we could just take like 20 and use that to obtain

76
00:06:28,000 --> 00:06:31,030
the values for the skill and the zero point.

77
00:06:31,120 --> 00:06:37,240
Now, if you new to this notion of scale and zero point, it's important to check our previous sessions

78
00:06:37,240 --> 00:06:38,410
where we treat this.

79
00:06:38,410 --> 00:06:41,050
So let's run this here.

80
00:06:41,050 --> 00:06:48,070
We also run this and now we set to convert all this model years pre trained our pre-trained model.

81
00:06:48,070 --> 00:06:50,470
So let's run that and that should be fine.

82
00:06:50,470 --> 00:06:53,670
So we now set to carry out the conversion.

83
00:06:53,680 --> 00:06:55,450
Now we're done with the conversion.

84
00:06:55,450 --> 00:06:58,960
We are now going to save this in the TF lite format.

85
00:06:58,960 --> 00:07:08,530
So we have this path and this path we are going to have this file, so let's run the cell and that's

86
00:07:08,530 --> 00:07:08,860
fine.

87
00:07:08,860 --> 00:07:17,470
So we get that we have the file size here and when we check this up, you see we have this 21 megabytes.

88
00:07:17,470 --> 00:07:22,750
So we're going from this model which we could check out here.

89
00:07:22,960 --> 00:07:32,410
We're going from this model, which is 90.7 megabytes to a 21.12 megabyte model.

90
00:07:33,040 --> 00:07:38,350
Now, before moving on, we should note that if we want to implement dynamic quantization here, then

91
00:07:38,350 --> 00:07:42,100
we wouldn't specify this representative data generator.

92
00:07:42,550 --> 00:07:43,540
So that's it.

93
00:07:43,960 --> 00:07:45,490
Let's get back.

94
00:07:46,780 --> 00:07:53,620
We install this TensorFlow lite runtime.

95
00:07:53,710 --> 00:08:01,990
Now, talking about installing the TensorFlow Lite runtime, once we already have this TensorFlow lite

96
00:08:01,990 --> 00:08:08,890
file right here, if we want to run this in some other system, say for example, want to run this in

97
00:08:08,890 --> 00:08:14,940
our Raspberry Pi, all we'll need to do now will be to install this runtime and I'll be it.

98
00:08:14,950 --> 00:08:18,160
We wouldn't need to install TensorFlow any longer.

99
00:08:18,160 --> 00:08:19,450
So we just have this.

100
00:08:19,450 --> 00:08:29,230
We run this that gets installed, we import the flight runtime, then we prepare our test image.

101
00:08:29,230 --> 00:08:30,640
So we just run this.

102
00:08:30,640 --> 00:08:31,750
We've seen this already.

103
00:08:31,750 --> 00:08:33,490
We have our Pre-trained model.

104
00:08:33,490 --> 00:08:38,070
We're going to get the max and then the corresponding class.

105
00:08:38,080 --> 00:08:39,160
So that's it.

106
00:08:39,610 --> 00:08:40,620
We should get angry.

107
00:08:40,630 --> 00:08:43,300
See, it matches with what we expect.

108
00:08:44,350 --> 00:08:46,510
We could try out this other example here.

109
00:08:46,540 --> 00:08:47,710
Let's run this.

110
00:08:47,860 --> 00:08:49,260
And there we go.

111
00:08:49,270 --> 00:08:51,430
So this our model.

112
00:08:51,430 --> 00:08:58,480
Now we're going to use this runtime to run our TensorFlow lite model.

113
00:08:59,050 --> 00:09:06,460
Now we've restarted the session and you'll see that without TensorFlow, let's let's let's do this.

114
00:09:07,180 --> 00:09:09,910
Let's say TF Zeros.

115
00:09:09,910 --> 00:09:14,230
And one by two, for example, we run that and you see this is not defined.

116
00:09:14,230 --> 00:09:16,810
So we have no inputs for now.

117
00:09:16,930 --> 00:09:21,640
Now let's take this off and then we get back up here.

118
00:09:21,640 --> 00:09:26,290
We install our runtime in part the runtime.

119
00:09:26,740 --> 00:09:28,810
Oh, I think we'll be needing non.

120
00:09:29,500 --> 00:09:33,370
So what we'll do is we're going to take this nom pi.

121
00:09:33,760 --> 00:09:38,260
So we we trying to work without necessarily needing TensorFlow.

122
00:09:38,260 --> 00:09:41,230
So we have this imported as numpy.

123
00:09:41,680 --> 00:09:42,700
That's it.

124
00:09:43,600 --> 00:09:52,270
Here we have this test image open CV so we will import CV two.

125
00:09:52,270 --> 00:09:53,200
That's fine.

126
00:09:54,610 --> 00:09:59,560
Well here we, we make use of TensorFlow but we will see how to get.

127
00:09:59,650 --> 00:10:01,620
Read of this dependence on TensorFlow.

128
00:10:01,630 --> 00:10:07,300
So first of all, we have to note that TensorFlow was used here to convert this test image here.

129
00:10:07,300 --> 00:10:16,780
Let's print out our test image to convert this test image, which is an unsigned int with it bit.

130
00:10:16,960 --> 00:10:19,540
So your let's do this.

131
00:10:19,990 --> 00:10:24,820
You see it's an unsigned into one to convert this into a float.

132
00:10:26,140 --> 00:10:30,040
And that's why we made this change here.

133
00:10:30,040 --> 00:10:33,250
So you wouldn't need this any longer.

134
00:10:33,250 --> 00:10:36,570
And here we could use nom pi.

135
00:10:36,580 --> 00:10:39,640
So we have that and everything looks fine.

136
00:10:39,730 --> 00:10:44,620
Okay, so we have this test image and that's fine.

137
00:10:44,620 --> 00:10:47,770
So now let's run this and we have our image.

138
00:10:47,890 --> 00:10:49,810
Now, image is not defined.

139
00:10:49,810 --> 00:10:51,310
Let's get back here.

140
00:10:51,730 --> 00:10:53,740
No, this is the test image.

141
00:10:53,740 --> 00:10:55,720
Test image.

142
00:10:55,720 --> 00:10:56,710
Run that again.

143
00:10:56,710 --> 00:10:57,730
This should be fine.

144
00:10:58,870 --> 00:11:01,000
Okay, So we have now our image.

145
00:11:01,000 --> 00:11:08,470
And then here we see we have this interpreter which loads our TensorFlow lite file, which will save

146
00:11:08,560 --> 00:11:11,520
the drive, and then we allocate to answers.

147
00:11:11,530 --> 00:11:20,440
Once this is done, we move on to get the details, the input and output details which we had from the

148
00:11:20,440 --> 00:11:21,740
conversion process.

149
00:11:21,760 --> 00:11:30,310
Now here we have the unsigned end and here we also have the unsigned int and then you see that we have

150
00:11:30,310 --> 00:11:34,720
this test image which we will change the type.

151
00:11:34,720 --> 00:11:40,120
So first of all, you notice that this image is ten to an umpire array, which we don't need any longer

152
00:11:40,120 --> 00:11:42,790
because this already non pi and then even this type.

153
00:11:42,790 --> 00:11:49,570
We do not really need to do this, although if you print this, let's command the section and if you

154
00:11:49,570 --> 00:11:59,200
print this input details and we get the data type, you will see that we have an unsigned int Now to

155
00:11:59,210 --> 00:12:04,810
have not defined well here we'll use a light.

156
00:12:04,810 --> 00:12:11,200
So let's, let's have this run that again we have to have like runtime has no attribute interpreter.

157
00:12:11,710 --> 00:12:16,930
Now what we'll do is we'll have this dot interpreter.

158
00:12:16,930 --> 00:12:19,900
Okay, so we should have this and this should work now.

159
00:12:20,290 --> 00:12:21,070
Okay, that's it.

160
00:12:21,070 --> 00:12:22,840
Now we have this.

161
00:12:22,840 --> 00:12:25,240
We run this again, and that's fine.

162
00:12:25,240 --> 00:12:31,450
So you see, we have you see the unsigned int, which is what is expected, because when doing the conversion

163
00:12:31,450 --> 00:12:37,140
we have specified that we wanted this data type for our input and our output.

164
00:12:37,150 --> 00:12:38,140
So that's it.

165
00:12:38,140 --> 00:12:40,060
Let's take this off.

166
00:12:40,060 --> 00:12:47,620
What we're saying is we don't necessarily need this step right here, so this will be useful If we had

167
00:12:47,620 --> 00:12:51,670
this as a as a tense of as a tensor TensorFlow tensor.

168
00:12:51,670 --> 00:12:56,440
And if we do not have this as an unsigned int already.

169
00:12:56,440 --> 00:13:00,490
So now that we have that, you see we set the tensor.

170
00:13:00,490 --> 00:13:04,780
So here we have our test image, that's it.

171
00:13:04,780 --> 00:13:12,280
And then the input details index, you could print this out so you see what's in your input details

172
00:13:12,280 --> 00:13:13,450
index.

173
00:13:13,870 --> 00:13:16,300
As you could see, it's zero though.

174
00:13:16,300 --> 00:13:18,340
Now in this line we get in an error.

175
00:13:18,850 --> 00:13:21,760
We got three but expected for for the input.

176
00:13:21,970 --> 00:13:24,130
So let's get back here.

177
00:13:25,310 --> 00:13:26,110
Oh okay.

178
00:13:26,140 --> 00:13:26,380
Yeah.

179
00:13:26,380 --> 00:13:33,040
We, yeah, we had this change to m Let's have we actually let's get back.

180
00:13:33,040 --> 00:13:40,480
So what we're saying is here we have this expand DIMMs to get from three dimension to four dimensions

181
00:13:40,480 --> 00:13:45,160
and the name was still M So let's have that.

182
00:13:45,160 --> 00:13:52,570
We run this again here we have our test image now which has four dimensions and then now this should

183
00:13:52,570 --> 00:13:53,080
work.

184
00:13:53,080 --> 00:13:58,030
So here we set the tensor and then we run the inference.

185
00:13:58,030 --> 00:13:58,750
So that's it.

186
00:13:58,750 --> 00:14:00,160
We run the inference here.

187
00:14:00,160 --> 00:14:08,620
And once we run the inference, we should be able to get the tensor at the level of the output.

188
00:14:08,740 --> 00:14:13,000
So let's take this off and then run this now.

189
00:14:14,920 --> 00:14:16,230
Takes a while.

190
00:14:16,240 --> 00:14:16,470
Yeah.

191
00:14:16,480 --> 00:14:16,650
This.

192
00:14:16,760 --> 00:14:19,110
This is an inference process.

193
00:14:19,120 --> 00:14:27,280
Now, it should be noted that TensorFlow lite has been built for mobile and embedded CPUs.

194
00:14:27,280 --> 00:14:36,370
So general purpose CPUs like this collapsed CPUs are the best match for TensorFlow lite models in terms

195
00:14:36,370 --> 00:14:37,210
of speed.

196
00:14:37,330 --> 00:14:44,290
Now we print out the output of just let's print out the output, see what's what's in there.

197
00:14:44,320 --> 00:14:45,610
See, there we go.

198
00:14:45,970 --> 00:14:48,370
It shows that we have this here, the highest value.

199
00:14:48,370 --> 00:14:55,900
Then we could do an arc, max, of this output right here.

200
00:14:56,860 --> 00:14:57,970
We run that again.

201
00:14:58,330 --> 00:14:59,140
That's it.

202
00:15:00,100 --> 00:15:05,170
Now, let's do let's take this here so we could get the class name automatically.

203
00:15:05,440 --> 00:15:09,250
Then, now let's run this and we get the class happy.

204
00:15:09,280 --> 00:15:11,440
So this is what we expected.

205
00:15:12,130 --> 00:15:16,960
And again, we've done this without having to import TensorFlow.

206
00:15:16,960 --> 00:15:20,710
So we did not we didn't need TensorFlow once.

207
00:15:21,010 --> 00:15:27,250
We already had our TF Lite model, because here we have not defined.

208
00:15:28,000 --> 00:15:34,140
Now our next step will be to measure the accuracy of our TensorFlow lite quantized model.

209
00:15:34,150 --> 00:15:38,210
So here we do exactly the same process as we had before.

210
00:15:38,230 --> 00:15:45,070
We basically have the model path and then this input details, output details, and then we go to our

211
00:15:45,070 --> 00:15:48,600
validation data set, take 100 elements.

212
00:15:48,610 --> 00:15:49,660
Now validation data set.

213
00:15:49,660 --> 00:15:56,710
So this means that we have to import TensorFlow for this process since we are trying to evaluate the

214
00:15:56,710 --> 00:15:58,060
model's performance.

215
00:15:58,060 --> 00:15:59,860
So here we have this.

216
00:15:59,980 --> 00:16:07,390
We will need the validation data set, so we'll need to get back and run this cells here.

217
00:16:07,390 --> 00:16:08,770
So let's get back here.

218
00:16:08,770 --> 00:16:10,270
We will run this.

219
00:16:11,110 --> 00:16:12,340
There we go.

220
00:16:12,340 --> 00:16:14,680
We re running all the cells.

221
00:16:15,430 --> 00:16:17,650
Now we want to import this.

222
00:16:17,650 --> 00:16:20,800
When TensorFlow is already imported, we get in this error.

223
00:16:20,800 --> 00:16:26,080
So what we'll do is we just have this year we run that, there we go.

224
00:16:26,080 --> 00:16:31,660
And then when we get to this accuracy, you will notice we use that the TensorFlow lite.

225
00:16:31,660 --> 00:16:35,260
So it's tf dot lite, not tf lite.

226
00:16:35,260 --> 00:16:41,890
So we using this model from TensorFlow and not from this package which we had installed here.

227
00:16:42,400 --> 00:16:47,740
So that said, now we have that set accuracy.

228
00:16:47,740 --> 00:16:49,150
We have input output.

229
00:16:49,150 --> 00:16:56,050
As we're saying, our validation date is set test image, which is going to be passed here.

230
00:16:57,280 --> 00:17:02,350
The inference we get, the output we compare, if they're the same, we increase accuracy.

231
00:17:02,350 --> 00:17:07,000
If not, we skip and then we move on to increase the total.

232
00:17:07,000 --> 00:17:13,870
And then from here we have accuracy divided by total or let's, let's say positives divided by total

233
00:17:14,110 --> 00:17:19,540
or let's say correct, correct your correct predictions.

234
00:17:19,540 --> 00:17:21,610
So let's change this.

235
00:17:22,390 --> 00:17:23,770
So correct.

236
00:17:23,770 --> 00:17:28,090
So here we have correct predictions and that should be it.

237
00:17:28,090 --> 00:17:32,530
So we have this accuracy here and then we specify the model path.

238
00:17:32,530 --> 00:17:39,430
Let's get back here and then take this path of our TF Lite model.

239
00:17:39,460 --> 00:17:47,740
Now we we have simply accuracy, accuracy, and then we specify that path.

240
00:17:49,120 --> 00:17:52,210
Okay, let's have this and we run this now.

241
00:17:52,330 --> 00:17:53,260
There we go.

242
00:17:53,260 --> 00:18:00,100
We're done with computing the accuracy for our TF Lite model and we get 0.82.

243
00:18:00,100 --> 00:18:06,370
That is 82% as compared to the 84% with the original model.

244
00:18:07,090 --> 00:18:14,380
Now, to get the more accurate value for this, it's advisable to use the whole data set.

245
00:18:14,620 --> 00:18:16,360
So you could take this off.

246
00:18:17,080 --> 00:18:25,240
And so now you have your model which performs at 82% accuracy and which now could be deployed in some

247
00:18:25,240 --> 00:18:26,710
mobile device.