1
00:00:00,240 --> 00:00:07,090
Hi there and welcome to this new and exciting session in which we are going to treat quantization our

2
00:00:07,320 --> 00:00:09,390
training with TensorFlow.

3
00:00:09,570 --> 00:00:18,600
Now, in some previous sections we started by explaining what quantization is all about and the advantages

4
00:00:18,600 --> 00:00:21,120
of quantization models.

5
00:00:21,120 --> 00:00:29,370
We also looked at different quantization methods and we looked at the relative advantages and disadvantages

6
00:00:29,370 --> 00:00:36,330
of this different methods that said, we are going to see in this section how to quantized a full model

7
00:00:36,330 --> 00:00:40,110
or just some layers which make up that model.

8
00:00:40,110 --> 00:00:48,330
In TensorFlow, the spatial model which parameters carry out quantization is this tf mod, which stands

9
00:00:48,330 --> 00:00:52,980
for tensor flow model optimization.

10
00:00:52,980 --> 00:01:00,630
And so here we start by installing this TensorFlow model optimization model and then we'll import this

11
00:01:00,630 --> 00:01:02,310
as tf mod.

12
00:01:02,340 --> 00:01:07,900
Then since we want to do quantization, we could get in your class and that's it.

13
00:01:07,920 --> 00:01:10,440
Here we have different methods and classes.

14
00:01:10,440 --> 00:01:13,430
Let's get into this concise model right here.

15
00:01:13,440 --> 00:01:18,330
As you can see, quantized accurate model with a default quantization implementation.

16
00:01:18,330 --> 00:01:24,270
And so here we have to simply pass in this to quantized argument, which is the model to be quantized.

17
00:01:24,270 --> 00:01:29,480
And then we should get a quantization aware model.

18
00:01:29,490 --> 00:01:34,950
So you're for example, you see this model defined the sequential model, and then we also have this

19
00:01:34,950 --> 00:01:41,580
functional model then to quantized this, we just call this method quantized model right here.

20
00:01:41,580 --> 00:01:46,980
And then we pass in our model and we have our quantization, our model.

21
00:01:47,070 --> 00:01:54,060
So that said, let's go ahead and implement this here We have our model in this case.

22
00:01:54,060 --> 00:01:56,490
Let's let's start with our hugging face model.

23
00:01:56,490 --> 00:02:00,660
So we have this our hugging face model, which we declared already.

24
00:02:00,660 --> 00:02:05,910
And now let's say we want to have our quant aware hugging face.

25
00:02:06,270 --> 00:02:06,770
Okay?

26
00:02:06,780 --> 00:02:14,850
So we want this quantization, our hugging face model, and then we want to use TF mode dot quantiles

27
00:02:15,660 --> 00:02:16,610
model.

28
00:02:16,620 --> 00:02:21,780
So basically, let's copy this here and then paste it out in the code.

29
00:02:22,170 --> 00:02:27,780
So we have that picks it out and then here we have our hugging face model.

30
00:02:27,780 --> 00:02:32,480
So from this we're running, this should give us our quantization, our model.

31
00:02:32,490 --> 00:02:38,760
Now we get an error quantization, a Ptfe Keras model inside another Ptfe Keras model is not supported.

32
00:02:38,760 --> 00:02:40,950
So as of now, this isn't supported.

33
00:02:40,980 --> 00:02:49,860
Now let's try out with the efficient net model though this should should be the same error because the

34
00:02:49,860 --> 00:02:52,800
efficient net model let's get up here.

35
00:02:54,540 --> 00:02:56,940
The definition for the efficient and model.

36
00:02:57,150 --> 00:02:58,830
First of all, you can see the hugging face model.

37
00:02:58,830 --> 00:03:02,100
You have this model in this model.

38
00:03:02,100 --> 00:03:05,310
So that's the reason why that doesn't work.

39
00:03:05,310 --> 00:03:10,580
And then we get to the efficient net model trans follow.

40
00:03:10,590 --> 00:03:16,920
Okay, So we have this model right here, and we could see that we have this Keras model, which is

41
00:03:16,920 --> 00:03:19,320
this backbone here in this model.

42
00:03:19,320 --> 00:03:26,700
And so if we are to use this, we have to look for a way to break this backbone up into its different

43
00:03:26,700 --> 00:03:27,510
layers.

44
00:03:27,630 --> 00:03:34,320
But as of now, what we've been doing is just making use of this backbone as here we just have this

45
00:03:34,320 --> 00:03:34,890
backbone.

46
00:03:34,890 --> 00:03:35,820
And that was it.

47
00:03:35,850 --> 00:03:39,810
We didn't actually break this model up into different layers.

48
00:03:39,840 --> 00:03:45,060
Now, that said, let's copy out this efficient net model right here.

49
00:03:46,590 --> 00:03:48,240
We'll have to take all this off.

50
00:03:48,720 --> 00:03:53,730
And then now we are no longer making use of this input right here.

51
00:03:54,150 --> 00:03:55,350
So wouldn't use this.

52
00:03:55,380 --> 00:03:57,840
We use the backbones input directly.

53
00:03:57,930 --> 00:03:59,430
So let's look at that.

54
00:03:59,430 --> 00:04:00,450
We have this X.

55
00:04:00,450 --> 00:04:07,890
So from here we have the backbones output which will get into this global average pooling layer.

56
00:04:07,890 --> 00:04:11,100
So here we have backbone output.

57
00:04:11,370 --> 00:04:12,000
That's it.

58
00:04:12,000 --> 00:04:15,690
We have this output which gets into the global average pooling.

59
00:04:15,690 --> 00:04:25,590
We have this x here, that's it, which now passes to this dense layer and then to the batch nom layer

60
00:04:26,220 --> 00:04:30,840
and then to this dense layer such that we have an output right here.

61
00:04:31,350 --> 00:04:32,970
Okay, So we have that.

62
00:04:32,970 --> 00:04:35,850
Let's pass in this x values.

63
00:04:36,780 --> 00:04:37,800
There we go.

64
00:04:37,800 --> 00:04:40,740
And finally we have this.

65
00:04:40,740 --> 00:04:42,900
Then from year now we create our model.

66
00:04:42,900 --> 00:04:53,550
So it's our Pre-trained or Pre-trained model is Keras model and which takes inputs, the backbone input.

67
00:04:53,550 --> 00:04:59,970
So now our our backbone input is our input and then our output.

68
00:05:00,120 --> 00:05:05,250
It is simply this output right here.

69
00:05:05,520 --> 00:05:07,950
So we have that output.

70
00:05:08,320 --> 00:05:08,730
Okay.

71
00:05:08,730 --> 00:05:11,780
So we have this set now and everything should work fine.

72
00:05:11,790 --> 00:05:13,650
So let's run this here.

73
00:05:13,890 --> 00:05:15,730
And what do you notice?

74
00:05:15,750 --> 00:05:23,130
You notice that the Keras model, what we had previously as our Keras model, has now been broken up.

75
00:05:23,310 --> 00:05:31,800
So you could see I think we should have just said here pre-trained functional model.

76
00:05:31,800 --> 00:05:34,770
Let's call this functional model.

77
00:05:35,310 --> 00:05:36,150
Okay, so let's.

78
00:05:36,150 --> 00:05:38,400
Let's go back and run this order.

79
00:05:38,400 --> 00:05:39,510
Sell your.

80
00:05:40,630 --> 00:05:41,440
Let's get back.

81
00:05:41,480 --> 00:05:45,820
Your Pre-trained model.

82
00:05:47,790 --> 00:05:56,220
Okay, so let's run this again, this operating model and then so we could run this summaries down here

83
00:05:56,220 --> 00:05:56,910
and you could see.

84
00:05:57,570 --> 00:06:02,670
So you see here we have this model is exactly the same model we're dealing with.

85
00:06:02,670 --> 00:06:09,840
So for this exact same model, what we want to do is just to paste this, to have the pre trained.

86
00:06:10,080 --> 00:06:17,910
So this is a pre trained model or let's get a summary pre trained model Somali run that let's reduce

87
00:06:17,910 --> 00:06:26,610
this so we could get into the space we have this and you see we still have this exact same total parameters

88
00:06:26,610 --> 00:06:32,130
is here same number of parameters number of non trainable parameters exactly the same.

89
00:06:32,130 --> 00:06:34,230
So it's basically the same thing.

90
00:06:34,230 --> 00:06:38,250
But the difference here is we do not have this.

91
00:06:38,910 --> 00:06:40,290
Let's open that up again.

92
00:06:40,290 --> 00:06:45,840
We do not have pre trained summary.

93
00:06:46,260 --> 00:06:52,950
We do not have this Keras model here, so we do not have this model right here.

94
00:06:52,950 --> 00:07:01,860
And so because we don't have this now, it will be possible for us to make use of this method and quantiles

95
00:07:01,860 --> 00:07:03,750
are full model.

96
00:07:04,290 --> 00:07:05,220
So that's it.

97
00:07:05,220 --> 00:07:07,680
We have this pre-trained model.

98
00:07:07,680 --> 00:07:09,360
Now let's run this again.

99
00:07:09,360 --> 00:07:14,940
So we have this pre trained, pre trained, functional model.

100
00:07:15,840 --> 00:07:16,950
Let's run that.

101
00:07:18,210 --> 00:07:25,620
Okay, we have our model set and now what we can do is we would run this now.

102
00:07:25,620 --> 00:07:27,870
So let's run this again and see what we get.

103
00:07:28,470 --> 00:07:34,050
So increase the size and there we go, we get another error.

104
00:07:34,050 --> 00:07:36,510
This the same error actually.

105
00:07:36,510 --> 00:07:37,830
Let's get back here.

106
00:07:38,070 --> 00:07:39,990
This should be pre trained, functional.

107
00:07:40,410 --> 00:07:41,940
So let's run this.

108
00:07:44,790 --> 00:07:45,090
Now.

109
00:07:45,090 --> 00:07:46,040
It's taken more time.

110
00:07:46,050 --> 00:07:47,820
Hopefully everything shall work well.

111
00:07:48,960 --> 00:07:55,040
Now, instead, we get this other error where we're told that this is killing is not all this.

112
00:07:55,080 --> 00:07:58,260
This layer here is not supported.

113
00:08:00,860 --> 00:08:01,880
And this is normal.

114
00:08:01,880 --> 00:08:07,940
Since you're in this reskilling layer, we do not have any weights and so we are not going to be carrying

115
00:08:07,940 --> 00:08:10,050
our quantization for such layers.

116
00:08:10,070 --> 00:08:18,530
So what we can do now is instead of quantized, the whole model will select some layers we want to quantized.

117
00:08:18,770 --> 00:08:21,440
So your there we go.

118
00:08:21,470 --> 00:08:23,740
What we'll do is instead select some layers.

119
00:08:23,750 --> 00:08:31,220
So this means that if we had let's define a simple model, so let's let's get back to the top.

120
00:08:31,940 --> 00:08:36,320
And then we define, for example, this little net model without this resize scale.

121
00:08:36,320 --> 00:08:37,220
So that's it.

122
00:08:37,910 --> 00:08:39,860
Quite simple model.

123
00:08:39,860 --> 00:08:41,120
We have that.

124
00:08:41,120 --> 00:08:47,780
Now let's run this, let's run the cell and then oops, we get in an error.

125
00:08:48,590 --> 00:08:50,930
So that's because we we took off the researcher skill.

126
00:08:50,930 --> 00:08:56,810
I would not specify this year exact for an exact input size.

127
00:08:56,810 --> 00:08:58,400
So let's run this again.

128
00:08:58,400 --> 00:08:59,430
That's fine.

129
00:08:59,450 --> 00:09:02,000
Now we have our Loonette model.

130
00:09:02,000 --> 00:09:06,470
Let's do this little net model and run that.

131
00:09:08,070 --> 00:09:09,570
We get another error.

132
00:09:09,600 --> 00:09:10,710
Let's check that out.

133
00:09:10,740 --> 00:09:14,990
This bash Norm is not supported, so you cannot count quantiles this batch nom layer.

134
00:09:15,000 --> 00:09:20,400
So what we'll do is let's let's basically remove the batch nom layers, but later on we'll see how to

135
00:09:20,430 --> 00:09:24,800
have to to, to, to quantiles only some layers.

136
00:09:24,810 --> 00:09:27,690
So for now, let's just remove this batch.

137
00:09:27,690 --> 00:09:32,220
Norm layers, batch num off, drop out.

138
00:09:32,220 --> 00:09:36,060
Let's take the drop out to batch norm of and that's it.

139
00:09:37,020 --> 00:09:38,040
So we have that.

140
00:09:38,040 --> 00:09:39,390
Let's run this again.

141
00:09:40,890 --> 00:09:41,820
Let's see what we get.

142
00:09:43,050 --> 00:09:44,190
Okay, so that's fine.

143
00:09:44,190 --> 00:09:50,730
You see, now we've been able to make this net model quantization aware and we have done this for the

144
00:09:50,730 --> 00:09:52,480
whole or the full model.

145
00:09:52,500 --> 00:10:00,870
Now, in cases like this here, this model, you're this efficient model where we have this backbone,

146
00:10:00,870 --> 00:10:08,160
which is our pre-trained backbone, we cannot start taking off the normalization layer, for example,

147
00:10:08,160 --> 00:10:12,560
here and taking off this risk killing which comes with the backbone and so on and so forth.

148
00:10:12,570 --> 00:10:20,160
So what will instead do is we'll move layer by layer and select the layers which we want to actually

149
00:10:20,910 --> 00:10:23,310
make quantization aware.

150
00:10:23,430 --> 00:10:25,530
So that's basically what we'll do.

151
00:10:25,650 --> 00:10:31,800
And so instead of proceeding as we did here, we just learn that that is quantized in the whole the

152
00:10:31,800 --> 00:10:35,030
full model, we're going to go layer by layer.

153
00:10:35,040 --> 00:10:46,380
So with that we could comment that section there and now take this model of now in order to quantiles,

154
00:10:46,380 --> 00:10:53,150
only some layers of the model will make use of this quantized annotate layer method right here.

155
00:10:53,160 --> 00:11:00,990
So we see again we have quantization gyrus, quantized energy layer, and this takes in the model to

156
00:11:00,990 --> 00:11:03,900
annotate with some quantization configurations.

157
00:11:03,900 --> 00:11:08,850
So you're what they explain is this function does not actually quantized a layer.

158
00:11:08,850 --> 00:11:13,590
It is mainly merely used to specify that the layer should be quantized.

159
00:11:13,590 --> 00:11:21,630
So you see it's there to specify that the layer should be quantized, and so the layer then gets quantized

160
00:11:21,630 --> 00:11:24,540
accordingly when we do a quantized apply.

161
00:11:24,540 --> 00:11:31,080
So this is the quantized apply method here, click open that and that should be it.

162
00:11:31,080 --> 00:11:32,400
So let's get back.

163
00:11:32,670 --> 00:11:37,470
Oh, let's, let's just get let's just look at this example here where you see this layer.

164
00:11:38,100 --> 00:11:42,630
You see we have this model, but in this model want to quantized only this layer.

165
00:11:42,780 --> 00:11:45,480
And so as you could see, we have quantized annotate layer.

166
00:11:45,480 --> 00:11:53,340
And then once this is done, we do a quantized apply to get our quantization, our model, which here

167
00:11:53,340 --> 00:11:55,180
is called quantized model.

168
00:11:55,200 --> 00:12:00,640
So let's go ahead and see how to implement this with our pre-trained efficient model.

169
00:12:01,620 --> 00:12:13,980
We will now define this method, apply quantization to the conv layers, which takes in a layer, and

170
00:12:13,980 --> 00:12:27,480
then if that layer the name or rather if if this conf is in the layer name, we are going to carry out

171
00:12:27,480 --> 00:12:28,710
the quantization.

172
00:12:28,710 --> 00:12:32,430
So we're going to apply the quantization on the conv layers.

173
00:12:33,450 --> 00:12:36,000
So here we have layer OC.

174
00:12:36,000 --> 00:12:42,120
So in the case where we don't have that, we'll just return the layer itself.

175
00:12:42,120 --> 00:12:49,770
So the layer remains unchanged, whereas conv layers will become quantization aware.

176
00:12:49,890 --> 00:12:54,660
So we have this applied matter right here which will run.

177
00:12:54,660 --> 00:12:55,790
There we go.

178
00:12:55,800 --> 00:13:05,040
Now once we we have this method defined will make use of this clone model method right here to create

179
00:13:05,040 --> 00:13:11,550
a new model, but one which takes into consideration a certain clone function.

180
00:13:11,550 --> 00:13:18,230
We get back here and then we paste this out or we wouldn't making use of this in input tensor.

181
00:13:18,240 --> 00:13:20,100
So we just make use of the clone function.

182
00:13:20,100 --> 00:13:26,910
And our clone function here is this apply quantization to the conv layers.

183
00:13:27,480 --> 00:13:28,410
So that's it.

184
00:13:28,440 --> 00:13:34,410
Now you take this year, if you check out the errors, you'll see that wherever we have the conf layers

185
00:13:34,410 --> 00:13:41,220
you see like this one, we have this conf here, we have this comb for the device, convolutions and

186
00:13:41,220 --> 00:13:42,300
so on and so forth.

187
00:13:42,330 --> 00:13:49,530
Now you could also include this for the expand and reduce layers, but let's just work with only this

188
00:13:49,530 --> 00:13:50,250
comes.

189
00:13:50,250 --> 00:13:51,510
So that's it.

190
00:13:51,930 --> 00:14:01,050
We, you, you now understand how to pick out certain layers or how to leave out others from the quantization

191
00:14:01,050 --> 00:14:02,880
awareness process.

192
00:14:02,880 --> 00:14:06,660
So from here we have this apply right here.

193
00:14:06,870 --> 00:14:15,000
And then we'll call this our quant aware efficient net so that we have this quant, our efficient net,

194
00:14:15,000 --> 00:14:19,710
and then we run this, uh, not this model.

195
00:14:19,710 --> 00:14:23,520
This model here has to be our, our pre-trained model.

196
00:14:23,520 --> 00:14:25,730
So it's our pre-trained model.

197
00:14:25,740 --> 00:14:29,210
We run that again, and now this should be fine.

198
00:14:29,220 --> 00:14:34,230
Okay, so we have our quantiles, our model, which is now quantization, where and you notice that

199
00:14:34,230 --> 00:14:41,850
when we do quant our efficient net summary, we should get something slightly different from what we

200
00:14:41,850 --> 00:14:43,050
used to get in.

201
00:14:44,640 --> 00:14:47,750
We haven't this now.

202
00:14:47,760 --> 00:14:53,130
Oh this is, this should be func so we should have func model.

203
00:14:53,160 --> 00:14:57,510
Let's run that again and run now.

204
00:14:57,510 --> 00:15:01,410
Let's run this, let's get back here.

205
00:15:01,530 --> 00:15:07,140
And you see we have this quant aware efficient net and you will not notice that.

206
00:15:07,140 --> 00:15:09,270
Let's get back to the top.

207
00:15:09,960 --> 00:15:19,620
You notice that wherever we have this conv layers, C wherever we have the conf layers, we now instead

208
00:15:19,620 --> 00:15:21,840
have this quantized annotate.

209
00:15:21,990 --> 00:15:28,410
So as you scroll, you wouldn't see a kind of layer, but instead we have the quantiles annotate.

210
00:15:28,530 --> 00:15:29,310
So that's it.

211
00:15:30,330 --> 00:15:31,080
Years, years.

212
00:15:31,080 --> 00:15:34,470
Because this didn't have there's no conf in this name.

213
00:15:34,470 --> 00:15:40,320
So we could, as we said before, we could include this as e expanding as it reduce layers.

214
00:15:40,320 --> 00:15:41,040
So that's it.

215
00:15:41,110 --> 00:15:49,400
We, we now have this quantized annotate layers which wasn't what we had before making this model quantization.

216
00:15:49,410 --> 00:15:52,020
I wear some layers of this model.

217
00:15:52,020 --> 00:15:53,460
Quantization are where.

218
00:15:53,610 --> 00:16:00,060
So with that now we are done with the annotation and we're ready to make this actually quantization

219
00:16:00,060 --> 00:16:00,630
aware.

220
00:16:00,630 --> 00:16:03,930
So we call this quant our model.

221
00:16:05,190 --> 00:16:09,930
This is quant quant aware.

222
00:16:09,960 --> 00:16:12,600
Let's get this exact name right here.

223
00:16:12,600 --> 00:16:14,460
Quant aware, efficient.

224
00:16:14,850 --> 00:16:15,130
Okay.

225
00:16:15,240 --> 00:16:18,210
So we call that quant aware efficient.

226
00:16:18,480 --> 00:16:19,680
There we go.

227
00:16:19,680 --> 00:16:20,310
That's it.

228
00:16:20,310 --> 00:16:22,080
And now we have our quant, our model.

229
00:16:22,080 --> 00:16:25,620
Let's run this again and then see what we get as summary.

230
00:16:26,430 --> 00:16:29,310
And that's it is now quantization aware.

231
00:16:29,310 --> 00:16:34,020
So we know we no longer having the the annotations but now some wrappers.

232
00:16:34,020 --> 00:16:41,610
So your you see the, the layer name, but now we have this quant which is added to those layers.

233
00:16:42,570 --> 00:16:44,670
Let's scroll down, check out on this.

234
00:16:44,670 --> 00:16:48,260
You see we have this other ones here and so on and so forth.

235
00:16:48,270 --> 00:16:49,080
So that's it.

236
00:16:49,080 --> 00:16:56,640
We, we, we now have our quantization aware model and we're now ready to compile this model and train

237
00:16:56,640 --> 00:16:56,790
it.

238
00:16:56,790 --> 00:17:01,680
Like every regular model, we get back to our training right here.

239
00:17:01,680 --> 00:17:10,410
And then at the level of this compile, we have your quant, our model, that's it.

240
00:17:11,070 --> 00:17:12,870
This, this is a similar rate.

241
00:17:12,870 --> 00:17:13,860
So that's it.

242
00:17:13,860 --> 00:17:15,180
Okay, let's run this.

243
00:17:15,180 --> 00:17:21,210
And then here we also have our quant aware model.

244
00:17:21,870 --> 00:17:22,890
There we go.

245
00:17:22,890 --> 00:17:28,350
So let's run this, We get in this resource exhausted error.

246
00:17:28,350 --> 00:17:33,280
So I'm going to restart the session and hopefully everything should work fine.

247
00:17:33,300 --> 00:17:34,050
There we go.

248
00:17:34,050 --> 00:17:37,560
We started the session and now we able to train.

249
00:17:37,680 --> 00:17:38,430
Anyways.

250
00:17:38,430 --> 00:17:46,560
We see how to implement quantization our training with TensorFlow and in the next section we'll dive

251
00:17:46,560 --> 00:17:49,050
into post training quantization.