1
00:00:00,120 --> 00:00:05,820
Hello, everyone, and welcome to the session in which we are going to implement the code for Resonant

2
00:00:05,820 --> 00:00:07,920
three four in TensorFlow two.

3
00:00:08,010 --> 00:00:16,140
Now yours is a resonant three four model right here we have all the variants like the 5101 and 152.

4
00:00:16,200 --> 00:00:22,320
After going through this section, you'll be able to implement this other variants right here.

5
00:00:22,320 --> 00:00:29,850
And also you'll be able to get results like this where we can see clearly an improvement in the accuracy

6
00:00:29,850 --> 00:00:31,110
of our model.

7
00:00:31,110 --> 00:00:37,050
We are going to construct our resonant 34 model while making use of models of class.

8
00:00:37,050 --> 00:00:39,180
And so you could check on the previous sections.

9
00:00:39,180 --> 00:00:45,300
So you better understand how models obsolescence is implemented in TensorFlow too.

10
00:00:45,570 --> 00:00:54,810
Now that said, here we have this resonant 34 model right here, and then the first layer we have is

11
00:00:54,810 --> 00:01:02,610
our convolutional layer, which has seven by seven filters and they are 64 in number.

12
00:01:03,000 --> 00:01:06,770
Also, we know that the Strider number of strides is equal to.

13
00:01:06,780 --> 00:01:08,850
So you could get all this from the paper.

14
00:01:08,850 --> 00:01:15,310
We have the information from the paper seven by 764 Stride two and then followed by three by three max

15
00:01:15,330 --> 00:01:17,010
pull with a stride of two.

16
00:01:17,010 --> 00:01:18,480
So we get back here.

17
00:01:18,480 --> 00:01:21,300
We have this three by three stride of two.

18
00:01:21,630 --> 00:01:27,990
Now from this we then get into this residual blocks, so we'll get back to the paper.

19
00:01:27,990 --> 00:01:32,240
You see, we have two re of this residual blocks.

20
00:01:32,250 --> 00:01:37,080
Now this is actually this because this is a34 layer we're implementing right here, meaning that if

21
00:01:37,080 --> 00:01:43,620
you are implementing a 50 layer resonate, then you would have one by one, three by three and one by

22
00:01:43,620 --> 00:01:44,070
one.

23
00:01:44,070 --> 00:01:48,010
But for the 34 version, we have three by three and three by three.

24
00:01:48,030 --> 00:01:55,290
Now this repeated twice, so that's why you will notice here we have this repeated three times and each

25
00:01:55,290 --> 00:01:57,560
one of them is our residual block.

26
00:01:57,570 --> 00:02:02,670
Now, our residual block here our is the block is this block right here.

27
00:02:02,670 --> 00:02:07,560
So we are going to later on implement implement this residual block here.

28
00:02:07,770 --> 00:02:08,910
Let's get back to the code.

29
00:02:08,910 --> 00:02:11,040
We have simply our residual block.

30
00:02:11,040 --> 00:02:20,460
You see the parameters here, number of filters, 64, 64 and year 64, just exactly as we have it in

31
00:02:20,460 --> 00:02:21,060
the paper.

32
00:02:21,060 --> 00:02:24,810
You see for each block we have a number of filters to be 64.

33
00:02:24,930 --> 00:02:33,810
Now, if we want to get into into this residual block, we could or we'll see exactly how it's implemented

34
00:02:33,810 --> 00:02:40,230
and we'll see that we're going to have this two Conv layers one three by three and another three by

35
00:02:40,230 --> 00:02:40,770
three.

36
00:02:40,770 --> 00:02:43,440
But for now, let's do it this way.

37
00:02:43,440 --> 00:02:45,810
Let's just consider that that has been implemented.

38
00:02:45,990 --> 00:02:51,810
Now, another reason why I want to implement this this way is because now if we want to convert this

39
00:02:51,810 --> 00:02:58,530
to a at 50, all we need to do now is just to update the code for the residual block since, well,

40
00:02:58,560 --> 00:03:04,050
what makes this different here is just this residual blocks right here.

41
00:03:04,170 --> 00:03:07,110
So that said, let's get back again here.

42
00:03:07,110 --> 00:03:12,060
We have now this for resonant blocks.

43
00:03:12,060 --> 00:03:20,130
So we have this four residual blocks actually here you see you have this is four and it's similar.

44
00:03:20,130 --> 00:03:22,830
Like here you have the three by three, three by three.

45
00:03:22,830 --> 00:03:26,850
And then here the number of channels equal 128.

46
00:03:26,850 --> 00:03:35,310
So you notice here that we have 128 and we have four of them now because here we are living from or

47
00:03:35,310 --> 00:03:40,320
we modifying the number of channels we need to take into consideration.

48
00:03:40,320 --> 00:03:46,630
This number of strides we have right here as this permits us to downsample our features.

49
00:03:46,650 --> 00:03:54,150
Now getting back to the code you'll have this year where we have this DOWNSAMPLING and then we move

50
00:03:54,150 --> 00:03:54,410
on.

51
00:03:54,420 --> 00:04:02,430
128 128 128 Again here we have DOWNSAMPLING, now we go to 256 exactly as it's in the paper.

52
00:04:03,210 --> 00:04:04,880
You can look at this directly from here.

53
00:04:04,890 --> 00:04:05,160
Yeah.

54
00:04:05,160 --> 00:04:10,470
We have 256 and we have a certain number of them which are aligned.

55
00:04:10,470 --> 00:04:12,180
We could see that in the summary.

56
00:04:12,180 --> 00:04:13,700
Here's six of them.

57
00:04:13,710 --> 00:04:15,330
Check this out here.

58
00:04:16,050 --> 00:04:25,110
You see we have six of this aligned and then again we have 512, three of them as is in the paper here,

59
00:04:25,140 --> 00:04:27,800
512 and that's it.

60
00:04:27,810 --> 00:04:33,330
Now from here we have the global average pulling to the global average pulling, and then we have this

61
00:04:33,330 --> 00:04:38,280
final fully connected layer, which with an activation which is soft max, as we've seen previously.

62
00:04:38,430 --> 00:04:42,600
Now, that said, we just in this are called matter.

63
00:04:42,600 --> 00:04:50,070
We're just simply going to call all those different layers which we just created by passing the input.

64
00:04:50,070 --> 00:04:57,510
So here we have our input X, which goes through each and every layer and we get the output right here.

65
00:04:57,960 --> 00:04:59,910
Now, that said, we're going to move on.

66
00:05:00,060 --> 00:05:03,330
To looking at this residual block right here.

67
00:05:03,330 --> 00:05:05,370
So we're going to implement this residual block.

68
00:05:06,030 --> 00:05:08,990
And now this is basically our residual block.

69
00:05:09,000 --> 00:05:12,090
Let's increase this so we could see that clearly.

70
00:05:12,090 --> 00:05:15,000
This is basically our visit, our blog right here.

71
00:05:15,000 --> 00:05:17,250
So we could take one of this.

72
00:05:17,250 --> 00:05:19,770
Let's let's take this one, for example.

73
00:05:19,770 --> 00:05:25,710
We have this residual block right here, and then we get back to the code, see it here.

74
00:05:25,800 --> 00:05:28,740
Now, this is our residual block layer.

75
00:05:28,740 --> 00:05:38,760
Unlike the full model here, we have this residual layer and then you'll see that we have this dotted

76
00:05:39,300 --> 00:05:44,400
Boolean right here, which is true when the number of stripes is drawing from one.

77
00:05:44,400 --> 00:05:47,040
So let's run this year.

78
00:05:47,070 --> 00:05:48,810
Let's let's have this.

79
00:05:48,810 --> 00:05:53,310
We have dot and then let's specify a number of strides.

80
00:05:53,550 --> 00:05:59,010
Say equal one, we let's print out dotted after this.

81
00:05:59,010 --> 00:05:59,970
So there we go.

82
00:05:59,970 --> 00:06:05,340
We have your dotted, we run this, take this off.

83
00:06:05,340 --> 00:06:06,330
You see, it's false.

84
00:06:06,360 --> 00:06:10,580
Now, when we set this to two turns to true.

85
00:06:10,590 --> 00:06:13,650
So basically that's what dotted your dots.

86
00:06:13,650 --> 00:06:18,660
And if you get back, you will notice that we let's get back to this.

87
00:06:18,660 --> 00:06:21,480
We had this we selected this part.

88
00:06:21,600 --> 00:06:25,220
But since this is this isn't a dotted link.

89
00:06:25,260 --> 00:06:27,750
See, this link is full line.

90
00:06:27,750 --> 00:06:31,080
So your our dotted variable will be false.

91
00:06:31,080 --> 00:06:34,260
And when we get to this, our other variable will be true.

92
00:06:34,470 --> 00:06:36,390
Now, let's get back to the code.

93
00:06:36,390 --> 00:06:39,420
You see, we have let's take this off.

94
00:06:39,420 --> 00:06:45,000
You see here we have this, which we understand already, and I will get back here.

95
00:06:45,000 --> 00:06:51,120
Then after we have our two convolutional layers, now we'll define this custom come to DX, which again

96
00:06:51,120 --> 00:06:53,610
we are going to break up subsequently.

97
00:06:53,610 --> 00:07:00,780
So let's just understand that we have this custom curve to DX, which is represented by this year.

98
00:07:00,780 --> 00:07:06,130
So when we have this, this is it right here and this is the other one right here.

99
00:07:06,150 --> 00:07:11,820
Now you'll notice that the number of channels has been passed here and the number of stripes has also

100
00:07:11,820 --> 00:07:14,640
been passed, and that's exactly what was done here.

101
00:07:14,640 --> 00:07:18,690
So you see we passed in the number of channels and number of strides.

102
00:07:18,720 --> 00:07:23,670
Now, since by default, our number of strides is equal one, it means when we don't pass, we simply

103
00:07:23,670 --> 00:07:25,080
see a number of equal one.

104
00:07:25,080 --> 00:07:30,270
But in cases where we have this transitions, we have number of strides equal to.

105
00:07:30,270 --> 00:07:33,330
And so our value here is going to be changed.

106
00:07:33,420 --> 00:07:41,160
Now, that said, we see we define this conv layer, which has a number of channels, kernel size three

107
00:07:41,400 --> 00:07:43,320
as in the paper number of strides.

108
00:07:43,320 --> 00:07:45,480
And then we have the pattern same.

109
00:07:45,480 --> 00:07:50,970
So we ensure that the height and width of our input features remain unchanged.

110
00:07:51,180 --> 00:07:58,470
Now that said, also notice that we have this number of strides here for the second, which is equal

111
00:07:58,470 --> 00:07:59,130
one.

112
00:07:59,130 --> 00:08:06,600
And getting back to the paper that simply because even when we get in, even when we have this transitions

113
00:08:06,600 --> 00:08:13,770
here where we're getting from 64 to 128 and that we're also doing a max pulling, we have this straight

114
00:08:13,770 --> 00:08:16,950
or rather that we have in the striding, not max pulling.

115
00:08:16,950 --> 00:08:22,400
We have this stride value change for only one of the calf layers and not the two.

116
00:08:22,410 --> 00:08:25,500
So you see your only one, another two.

117
00:08:25,500 --> 00:08:30,960
And so that's why right here it is only this one which actually changes for this other one.

118
00:08:30,960 --> 00:08:32,550
It remains fixed, always one.

119
00:08:32,730 --> 00:08:40,950
Now that said, we have the activation layer and then if it's dotted then we're going to have this link

120
00:08:40,950 --> 00:08:41,790
right here.

121
00:08:41,790 --> 00:08:45,720
So if it's dotted, we're going to have let's draw this here.

122
00:08:45,720 --> 00:08:52,380
We're going to have one by one conv layer one by one is actually here.

123
00:08:52,380 --> 00:08:54,210
So we're going to have our one.

124
00:08:54,210 --> 00:08:55,920
Let's take this one off.

125
00:08:56,520 --> 00:09:04,350
We're going to have our one by one conf layer just right here to ensure that this to number of channels

126
00:09:04,350 --> 00:09:04,710
match.

127
00:09:04,710 --> 00:09:09,960
That's the number of channels we get as input here and as output actually match up.

128
00:09:09,960 --> 00:09:12,320
So that's a role of this.

129
00:09:12,330 --> 00:09:13,450
We've seen this already.

130
00:09:13,470 --> 00:09:16,170
Now we get back to the code we see here.

131
00:09:16,170 --> 00:09:17,250
Let's take this off.

132
00:09:17,250 --> 00:09:22,830
We see here this year and then we see the number of the kernel size here is one unlike here where we

133
00:09:22,830 --> 00:09:24,180
have a kernel size of three.

134
00:09:24,180 --> 00:09:29,570
And then we also specify the number of channels to ensure that it matches up with what we expect.

135
00:09:29,580 --> 00:09:32,940
Now, the number of strides here is gotten from this.

136
00:09:32,940 --> 00:09:36,330
So if it's two, you're going to have two is one, you're going to have one.

137
00:09:36,420 --> 00:09:37,080
From here.

138
00:09:37,080 --> 00:09:42,660
We have this set and then we can go ahead and do the calling.

139
00:09:42,660 --> 00:09:47,880
So again, please check out on the previous sessions where we treat models of classes so you understand

140
00:09:47,880 --> 00:09:49,230
exactly what's going on.

141
00:09:49,230 --> 00:09:52,470
So here we have the input.

142
00:09:52,530 --> 00:09:57,780
So the input it gets into the first conv layer, then gets to the second conv layer, that's the output

143
00:09:57,780 --> 00:09:59,430
from the first gets as input.

144
00:09:59,720 --> 00:10:00,380
Second.

145
00:10:00,380 --> 00:10:01,940
And then we get this output.

146
00:10:01,940 --> 00:10:06,130
And now let's suppose that we have a normal layer.

147
00:10:06,140 --> 00:10:07,880
Let's say we have in this one.

148
00:10:08,360 --> 00:10:09,730
Let's get back to this.

149
00:10:09,740 --> 00:10:13,520
Let's suppose that we are working with this one right here.

150
00:10:13,520 --> 00:10:17,990
In that case, then the input will be added to the output directly.

151
00:10:17,990 --> 00:10:26,030
So you see we have this add layer right here, this add layer TensorFlow, it takes the output, which

152
00:10:26,030 --> 00:10:34,430
is this and as this to the input and we now get x eyed goes to the activation, the blue, and that's

153
00:10:34,430 --> 00:10:35,270
our output.

154
00:10:35,300 --> 00:10:42,230
Now the case where we have this in the case where we have this one by one convolutional layer, specify

155
00:10:42,230 --> 00:10:50,600
this the case where we are at this position, then you will see that we will take the input and modify

156
00:10:50,600 --> 00:10:54,890
it before passing it to the output or before adding it with the output.

157
00:10:54,890 --> 00:10:57,400
So you see here we have this output.

158
00:10:57,410 --> 00:10:58,340
There we go.

159
00:10:58,760 --> 00:11:04,580
It remains here and then we modify this or we modify this input.

160
00:11:04,580 --> 00:11:05,000
Sorry.

161
00:11:05,000 --> 00:11:06,110
So we take this input.

162
00:11:06,440 --> 00:11:07,520
There is it here.

163
00:11:08,060 --> 00:11:11,390
Let's take all this off and try to redraw it.

164
00:11:11,390 --> 00:11:15,860
So we have this, we have our rest block and then we have our output.

165
00:11:16,100 --> 00:11:21,950
We have the addition, which is going to be right here, addition.

166
00:11:21,950 --> 00:11:27,800
And then we are going to have our one by one conv layer right here, which comes and adds up with this.

167
00:11:27,800 --> 00:11:34,460
Now this one by one convolution is exactly what's going on right here, and that's what we defined here

168
00:11:34,460 --> 00:11:36,200
since kernel size one.

169
00:11:36,530 --> 00:11:43,100
Now that said, you see, we add this up and then we get our output x add.

170
00:11:43,100 --> 00:11:49,040
So if we are having dotted, we have that else we go through the normal path.

171
00:11:49,040 --> 00:11:55,340
And that's basically why you see here we specify this sometimes and we don't in other times.

172
00:11:55,520 --> 00:11:57,940
Now we have understood how this works.

173
00:11:57,950 --> 00:12:01,630
Let's go ahead to look at the custom curve to the layer.

174
00:12:01,640 --> 00:12:06,020
Now the custom come to the layer is basically made up of our usual account to the layer.

175
00:12:06,020 --> 00:12:10,130
And with a batch number, remember the Resnick model?

176
00:12:10,760 --> 00:12:14,960
The rest of that paper makes use of the batch normalization layer.

177
00:12:14,960 --> 00:12:20,090
So basically here, instead of writing batch non batch nom every time our code, what we just want to

178
00:12:20,090 --> 00:12:26,930
do is combine this two and then we have our batch normalization with our come to the layer together.

179
00:12:26,930 --> 00:12:27,680
So that's it.

180
00:12:27,680 --> 00:12:29,450
We now run the cell.

181
00:12:29,900 --> 00:12:30,770
There we go.

182
00:12:30,770 --> 00:12:41,210
We run the cell, we again run the cell and then we can define our net 34, which is our net model,

183
00:12:41,210 --> 00:12:44,450
which we've just seen rest net 34.

184
00:12:44,480 --> 00:12:45,500
There we go.

185
00:12:45,500 --> 00:12:48,620
We have the rest.

186
00:12:48,650 --> 00:12:50,750
Net 34.

187
00:12:51,170 --> 00:12:53,750
Summary Let's run this.

188
00:12:54,260 --> 00:12:55,520
We get this error.

189
00:12:55,520 --> 00:12:57,110
We need to build our model.

190
00:12:57,110 --> 00:13:00,200
So what we're going to do here is quite simple.

191
00:13:00,200 --> 00:13:06,830
We will not take this resin at 34 and then we'll call this resin 34.

192
00:13:06,830 --> 00:13:10,700
So we'll pass in some inputs into this and 34 model.

193
00:13:10,700 --> 00:13:19,130
So yeah, we supposed to have zeros and then we have one by 256 by two, 56 by three.

194
00:13:19,130 --> 00:13:20,540
So we have this kind of input.

195
00:13:20,540 --> 00:13:23,270
We run that and we see that.

196
00:13:23,270 --> 00:13:25,430
Now we have our model.

197
00:13:25,430 --> 00:13:27,410
So our model summary.

198
00:13:27,410 --> 00:13:32,500
So this is our model summary, 21 million parameters and that's it.

199
00:13:32,510 --> 00:13:36,860
Now we go to the training, but this time around we're going to include some checkpoint.

200
00:13:36,860 --> 00:13:42,920
And so we got this from our previous session where we treated checkpoint in the model checkpoint.

201
00:13:42,920 --> 00:13:49,760
And so you're we're going to ensure that as we train, we save our best model weights.

202
00:13:50,300 --> 00:13:55,040
So that said, here we have this checkpoint called back again.

203
00:13:55,040 --> 00:13:59,300
You could check back on our previous sessions where we treat this callbacks.

204
00:13:59,300 --> 00:14:05,030
So here we have this callback which will permit us to all the weights for our best or our best performing

205
00:14:05,030 --> 00:14:07,820
model or best performing weights, actually.

206
00:14:07,820 --> 00:14:10,850
So here we have a monitor.

207
00:14:10,850 --> 00:14:13,730
We're going to monitor the validation accuracy.

208
00:14:13,730 --> 00:14:19,730
So let's take this off one to validation accuracy, self based best only true.

209
00:14:19,820 --> 00:14:20,660
So that's it.

210
00:14:20,660 --> 00:14:23,180
Let's run this our last function.

211
00:14:23,180 --> 00:14:31,400
But before we move on, let's get back to the section where we had this parameter training right here.

212
00:14:31,550 --> 00:14:37,310
Now we have this custom come to the model, the cells.

213
00:14:37,310 --> 00:14:43,130
We have this custom come to the model which we've seen already, and then we have this batch norm layer.

214
00:14:43,130 --> 00:14:50,300
But it should be noted that with the batch norm layer, we have to specify whether we are in training

215
00:14:50,300 --> 00:14:53,560
mode or in inference or testing mode.

216
00:14:53,570 --> 00:14:59,090
Now, the reason why we are doing this is because the parameters of our batch normal.

217
00:14:59,220 --> 00:15:06,180
Layer will react differently or behave differently in these two different modes.

218
00:15:06,480 --> 00:15:14,100
This means that during training, this layer will normalize the inputs with the mean and variance of

219
00:15:14,100 --> 00:15:16,140
the current batch of inputs.

220
00:15:16,170 --> 00:15:21,770
Now, we've seen this already, and then when we're not training, that's when training equals false.

221
00:15:21,780 --> 00:15:26,910
When inference mode, the layer will normalize its input using the mean and variance of it's moving.

222
00:15:26,910 --> 00:15:29,610
Statistics learned during training.

223
00:15:29,790 --> 00:15:34,710
So the simply means that we have this layer right here with some parameters.

224
00:15:34,710 --> 00:15:43,080
Let's call the parameters, say P, let's call the parameter P so you have some parameters right here.

225
00:15:43,380 --> 00:15:49,350
And then during training, our layer updates these parameters.

226
00:15:49,350 --> 00:15:57,030
But then during inference, we do not want to update these parameters as they were learned during training.

227
00:15:57,030 --> 00:16:04,980
And so we have to specify why or we have to pass in the training or set the training to false when we

228
00:16:04,980 --> 00:16:08,940
are not training the model or when we are evaluating or testing the model.

229
00:16:09,360 --> 00:16:13,380
Now what this simply means is that your will have this to training.

230
00:16:13,380 --> 00:16:16,860
So here I see with pass and training and then here we have training.

231
00:16:16,860 --> 00:16:19,860
So by default we could set this training to true.

232
00:16:19,860 --> 00:16:22,530
So by default we're in training mode and that's what we have.

233
00:16:22,530 --> 00:16:22,920
Now.

234
00:16:22,920 --> 00:16:26,700
Let's run the cell and then get into our residual block.

235
00:16:26,700 --> 00:16:31,080
For residual block here we have training again.

236
00:16:31,080 --> 00:16:32,400
So we have training.

237
00:16:32,400 --> 00:16:36,750
And then since we call on this, we'll have training.

238
00:16:36,990 --> 00:16:37,920
There we go.

239
00:16:37,920 --> 00:16:41,370
We have training and that's it.

240
00:16:41,370 --> 00:16:42,600
We have training.

241
00:16:43,530 --> 00:16:45,060
Let's run this.

242
00:16:45,060 --> 00:16:47,940
We get back to our complete network.

243
00:16:48,990 --> 00:16:51,540
And here we're going to have this training.

244
00:16:51,540 --> 00:16:55,740
So we piece that out here and that should be fine.

245
00:16:55,740 --> 00:17:02,040
So now, when we are not in training mode, we could specify this training parameter such that the batch

246
00:17:02,040 --> 00:17:06,840
nom layer or the batch nom layers parameters aren't modified.

247
00:17:07,140 --> 00:17:11,130
So we have that and that's it.

248
00:17:11,580 --> 00:17:12,660
So we have that set.

249
00:17:12,690 --> 00:17:17,970
Now let's run this and this time around we're going to set this training.

250
00:17:17,970 --> 00:17:21,180
So let's set training to be true.

251
00:17:21,180 --> 00:17:22,560
We have that true.

252
00:17:22,590 --> 00:17:25,050
Let's make sure that this was passed in here.

253
00:17:25,050 --> 00:17:28,650
So let's have this training and the default is true.

254
00:17:28,980 --> 00:17:29,490
Okay?

255
00:17:29,490 --> 00:17:34,400
We run that, we run this, and that looks fine.

256
00:17:34,410 --> 00:17:37,380
Now, we could set this to be false or we don't.

257
00:17:37,380 --> 00:17:38,970
When we don't put any value, it means it's true.

258
00:17:38,970 --> 00:17:40,890
So we set this to be false.

259
00:17:41,220 --> 00:17:44,640
It's training to be false, and that's it.

260
00:17:45,090 --> 00:17:46,050
So there we go.

261
00:17:46,050 --> 00:17:46,970
We have this set.

262
00:17:46,980 --> 00:17:49,170
Now let's get back to our last function.

263
00:17:49,170 --> 00:17:55,950
So we were at this point, we have our metrics, we run the metrics, we compile the model, but this

264
00:17:55,950 --> 00:17:58,110
time around we use a higher learning rate.

265
00:17:58,110 --> 00:18:04,830
So one thing you could also do is as describing the paper decrease this learning rate as soon as the

266
00:18:04,830 --> 00:18:06,120
model has plateauing.

267
00:18:06,120 --> 00:18:08,580
So we could start with a learning rate in the paper.

268
00:18:08,580 --> 00:18:12,690
It should be 0.1, although here we're going to start with 0.01.

269
00:18:12,690 --> 00:18:21,240
So yeah, the say you have this learning rate and then when it starts plateauing, you drop to 0.01

270
00:18:21,240 --> 00:18:25,610
and then you go with that and then instead of plateauing, you drop and so on and so forth.

271
00:18:25,620 --> 00:18:30,840
So this is what the purpose in the paper and you could always implement this.

272
00:18:30,840 --> 00:18:36,420
And we've seen this in some previous section where we implemented this kind of callback which permitted

273
00:18:36,420 --> 00:18:38,760
us to schedule our learning rate.

274
00:18:38,880 --> 00:18:39,750
So that's it.

275
00:18:39,750 --> 00:18:42,540
Let's take this off, get back to the code.

276
00:18:42,540 --> 00:18:47,220
And we have we've run this already, so we could start with the training.

277
00:18:47,910 --> 00:18:53,490
Now here we're going to train for 60 epochs and we're going to include the callbacks.

278
00:18:53,490 --> 00:19:00,690
So let's have this callbacks and we have our model checkpoint callback, which we had defined here.

279
00:19:00,690 --> 00:19:02,550
Let's copy this.

280
00:19:02,550 --> 00:19:06,690
There we go, copy this, and we have it here.

281
00:19:07,470 --> 00:19:08,340
So that's it.

282
00:19:08,340 --> 00:19:10,470
We can now run the cell.

283
00:19:11,010 --> 00:19:12,390
Now we've learned this training.

284
00:19:12,390 --> 00:19:14,760
I we've had this error here.

285
00:19:14,760 --> 00:19:21,210
Seven The model to the ADF five format requires the model to be a functional model or sequential model.

286
00:19:21,210 --> 00:19:25,470
It does not work for subclass models like in our case because such models are defined.

287
00:19:25,470 --> 00:19:32,490
We are the body of a python method which isn't safely serializable and consider saving to the TensorFlow

288
00:19:32,490 --> 00:19:37,590
saved format by setting save format ATF or using save weights.

289
00:19:37,590 --> 00:19:41,520
So now we're going to save those models in the TensorFlow format.

290
00:19:41,520 --> 00:19:45,570
All we need to do here is specify this folder and that should be fine.

291
00:19:45,570 --> 00:19:46,620
The mode is max.

292
00:19:46,620 --> 00:19:54,360
Now, since we want to store the weights which have the highest validation accuracy, that's fine.

293
00:19:54,360 --> 00:19:58,890
Let's now run this again, train our complete and we achieve an.

294
00:19:58,980 --> 00:20:06,870
Accuracy or the best accuracy of 83.6%, whereas for accuracy plot right here.

295
00:20:06,870 --> 00:20:17,800
And then we could evaluate the model here we get 82.3% and 94.6% for the top K accuracy.

296
00:20:17,820 --> 00:20:21,570
Now, what if we load our best model?

297
00:20:21,570 --> 00:20:25,680
Because the model we have in here is the latest model, the very last one.

298
00:20:25,680 --> 00:20:29,190
Now let's load our best model and re evaluate this.

299
00:20:29,220 --> 00:20:30,000
Let's add this code.

300
00:20:30,000 --> 00:20:34,130
So then we go ahead and load our best weights.

301
00:20:34,140 --> 00:20:40,860
Here we have resting and teddy for the load weights.

302
00:20:41,100 --> 00:20:43,440
Then here we have our best weights.

303
00:20:43,440 --> 00:20:46,080
So this should be a string.

304
00:20:46,080 --> 00:20:51,380
Since it's our folder, there's a photo where we start the weights rest net.

305
00:20:51,390 --> 00:20:54,900
We run this, that's fine.

306
00:20:54,900 --> 00:20:57,360
And then we evaluate our model.

307
00:20:58,420 --> 00:21:07,480
So we get in this accuracy of 83.6 and then top K accuracy of 95.6%.

308
00:21:07,810 --> 00:21:11,380
Now we go on to test this.

309
00:21:13,040 --> 00:21:14,180
That's fine.

310
00:21:14,180 --> 00:21:18,020
And here are some results we get.

311
00:21:18,620 --> 00:21:20,900
Happy, angry, sad, happy.

312
00:21:20,930 --> 00:21:21,560
Yeah, we miss.

313
00:21:21,560 --> 00:21:22,550
We miss one.

314
00:21:22,700 --> 00:21:24,560
We miss this one.

315
00:21:25,070 --> 00:21:26,600
That's two, three.

316
00:21:27,350 --> 00:21:29,750
And that's it.

317
00:21:29,750 --> 00:21:30,740
So we missed three.

318
00:21:30,800 --> 00:21:34,120
And then we have let's add the cell.

319
00:21:34,130 --> 00:21:37,730
We have 13 out of 16.

320
00:21:37,730 --> 00:21:42,110
And this is correct, 81.2%.

321
00:21:42,110 --> 00:21:42,850
Correct.

322
00:21:42,860 --> 00:21:44,270
Okay, So that's it.

323
00:21:44,390 --> 00:21:51,210
We've gone from 79% to 83% by modifying or changing our model.

324
00:21:51,230 --> 00:21:55,280
Now, let's plot out this confusion matrix and see what we get.

325
00:21:56,330 --> 00:21:57,130
There we go.

326
00:21:57,140 --> 00:22:01,130
Your results, which are much better than what we have had so far.