1
00:00:00,240 --> 00:00:07,950
We now move on to fine tuning that we've looked at transfer learning and now we get into fine tuning.

2
00:00:08,520 --> 00:00:16,410
But before getting to fine tuning, what we would first convert this code, which was built with a sequential

3
00:00:17,010 --> 00:00:21,020
API now into the functional API.

4
00:00:21,030 --> 00:00:24,390
So yours are converted model is basically the same thing.

5
00:00:24,390 --> 00:00:30,510
We have the input, we have the backbone which takes the input because this output global average pulling

6
00:00:30,510 --> 00:00:34,580
dense layer by storm dense layer and this dense layer right here.

7
00:00:34,600 --> 00:00:37,640
Then we have our fine tuned model right here.

8
00:00:37,650 --> 00:00:45,420
Let's run this code cell and then we could view a summary which is meant to be identical to what we

9
00:00:45,420 --> 00:00:48,390
had already with the Pre-trained model.

10
00:00:48,390 --> 00:00:54,510
So here we have this 17.673 thousand.

11
00:00:54,540 --> 00:00:56,850
See, this is 675.

12
00:00:57,300 --> 00:00:59,220
It doesn't match with what we expect.

13
00:00:59,220 --> 00:01:03,450
So let's get back here and we notice that we did not put this here.

14
00:01:03,450 --> 00:01:08,430
So let's X we did not include a batch nom layer.

15
00:01:08,430 --> 00:01:10,080
So let's run this again.

16
00:01:10,230 --> 00:01:18,450
And then we now have this summary right here, which is exactly the same as that of our previously built

17
00:01:18,450 --> 00:01:20,700
model with a sequential API.

18
00:01:20,970 --> 00:01:24,450
And then now we want to fine tune our model.

19
00:01:24,450 --> 00:01:28,410
That is all this layers which we have frozen that is not trained.

20
00:01:28,410 --> 00:01:31,980
We now want to make them trainable.

21
00:01:32,130 --> 00:01:40,980
So right here we'll get back and then we simply have backbone dot trainable and we set that to true.

22
00:01:41,580 --> 00:01:47,910
Then here we are going to set this trainable to false.

23
00:01:48,090 --> 00:01:56,730
Now, recall when we were building this Resnick 34 model right here, we had this trainable parameter

24
00:01:56,730 --> 00:01:58,230
which we made use of.

25
00:01:58,410 --> 00:02:02,490
You remember here we had training.

26
00:02:02,490 --> 00:02:04,050
Sorry is not trainable.

27
00:02:04,050 --> 00:02:05,310
Let's get back here.

28
00:02:05,310 --> 00:02:07,080
That was training.

29
00:02:07,080 --> 00:02:08,820
So we should have training.

30
00:02:08,820 --> 00:02:10,890
You're not trainable is different from this one.

31
00:02:10,890 --> 00:02:11,940
So take note of that.

32
00:02:11,940 --> 00:02:14,430
This is trainable and this is training.

33
00:02:14,430 --> 00:02:22,890
So when we set this training to false and we'll get back here, you will find that this batch Norm took

34
00:02:22,890 --> 00:02:25,200
in this training parameter.

35
00:02:25,500 --> 00:02:32,700
And the reason why we need this specially for the batch norm is simply because the batch norm works

36
00:02:32,700 --> 00:02:40,110
differently during training and inference, and then during training the batch nom layer normalizes

37
00:02:40,110 --> 00:02:45,720
its output using the mean and the standard deviation of the current batch of inputs.

38
00:02:46,680 --> 00:02:56,130
Whereas doing in France, the batch norm layer normalizes its outputs using a moving average of the

39
00:02:56,130 --> 00:03:01,020
mean and the standard deviation of the batches it has seen during the training.

40
00:03:01,020 --> 00:03:12,120
So since at inference or when we fine tuning, we do not want the batch norm to take the current mean

41
00:03:12,120 --> 00:03:17,640
or rather the mean and the standard deviation from the current batch of inputs.

42
00:03:18,360 --> 00:03:22,500
We are instead going to compute this from what it saw during training.

43
00:03:22,920 --> 00:03:27,480
And so that's why this training parameter is very important.

44
00:03:27,480 --> 00:03:30,720
We have this training right here.

45
00:03:30,720 --> 00:03:36,510
You see, it takes the input and the training where we could set training to be false for training mode

46
00:03:36,510 --> 00:03:41,310
and training to be rather training to be true for training mode and training to be false for inference

47
00:03:41,310 --> 00:03:44,460
or let's say, fine tuning.

48
00:03:44,790 --> 00:03:51,390
Now, before we move on, you should know that this layer, the trainable set to false, is different

49
00:03:51,390 --> 00:03:53,790
from set in training to false.

50
00:03:53,910 --> 00:04:01,410
When you set a layer's trainable parameter to false, it simply means we do not want to update the weights

51
00:04:01,410 --> 00:04:08,160
when training, But when we set training to false, it means we're working in inference mode.

52
00:04:09,180 --> 00:04:15,570
In the case of the batch norm, this gamma and beta are trainable parameters.

53
00:04:15,570 --> 00:04:21,720
And so when we say later trainable equals false, it means that they're not going to be updated during

54
00:04:21,720 --> 00:04:22,380
training.

55
00:04:23,610 --> 00:04:30,480
But on the other hand, this mean and variance aren't trainable parameters.

56
00:04:30,480 --> 00:04:35,250
Instead, they are parameters which adapt to the training data.

57
00:04:35,880 --> 00:04:41,460
And that is why when we are in France mode, that's when we set training to false.

58
00:04:41,460 --> 00:04:43,380
We do not want to disrupt.

59
00:04:44,380 --> 00:04:51,340
The mean and variance values garden during the training are based on the training inputs.

60
00:04:51,640 --> 00:05:01,300
And so as we saw already, this mean and variance at inference mode will be simply the moving average

61
00:05:01,300 --> 00:05:06,220
of the mean and standard deviation of the batches it has seen during training.

62
00:05:07,350 --> 00:05:15,440
And so clearly these two concepts, though, that is setting the weights to parameters, not to be trainable.

63
00:05:15,510 --> 00:05:24,660
And so clearly, the concept of setting these weights not to be trainable is different from that of

64
00:05:24,870 --> 00:05:26,970
working in inference mode.

65
00:05:27,360 --> 00:05:33,960
Nonetheless, it should be noted that in the case of the batch norm set and trainable to false on the

66
00:05:33,960 --> 00:05:38,250
layer means that the layer will be subsequently run in inference mode.

67
00:05:38,760 --> 00:05:44,460
Now, although we've seen that there's two errors that don't mean actually the same thing.

68
00:05:44,550 --> 00:05:50,070
Now, also note here that setting trainable on a model containing all the layers will recursively set

69
00:05:50,070 --> 00:05:52,950
the trainable value of all inner layers.

70
00:05:52,950 --> 00:06:00,750
And if the value of the trainable attribute is changed after call and compile method on a model, the

71
00:06:00,750 --> 00:06:02,700
new value doesn't take effect.

72
00:06:02,700 --> 00:06:06,270
For this model onto compile is called again.

73
00:06:07,040 --> 00:06:07,990
Okay, so that's it.

74
00:06:08,010 --> 00:06:10,740
You could check out all the drop out here.

75
00:06:10,740 --> 00:06:19,650
We have this drop out layer which doesn't have any trainable parameters, but remember that the way

76
00:06:19,650 --> 00:06:24,000
the drop out works is that you have, let's say, some inputs.

77
00:06:24,000 --> 00:06:24,840
Let's take this off.

78
00:06:24,840 --> 00:06:32,100
You have some inputs, and then if you do pass to the drop out layer at the end, you have some of these

79
00:06:32,100 --> 00:06:34,380
inputs which will be not considered.

80
00:06:34,380 --> 00:06:39,600
So you have maybe this will proceed, but this not take into consideration.

81
00:06:39,690 --> 00:06:43,290
Maybe this moves and then this not take into consideration.

82
00:06:43,290 --> 00:06:51,120
So the drop out of 0.5 simply means that half of our inputs will be will move forward and the other

83
00:06:51,120 --> 00:06:52,950
half will not be taken into consideration.

84
00:06:52,950 --> 00:06:56,970
And so you see that at inference.

85
00:06:56,970 --> 00:07:01,380
That is when we are actually trying to test our model.

86
00:07:01,380 --> 00:07:08,100
We do not need to drop out some of this neurons right here.

87
00:07:08,100 --> 00:07:16,500
And so generally the drop out also takes in this training parameter here where when we set training

88
00:07:16,500 --> 00:07:23,280
to true, it means that we are in training mode and so we could actually drop out some of this values,

89
00:07:23,280 --> 00:07:28,950
whereas when we set the training to false, then we are in inference mode and so we do nothing.

90
00:07:28,950 --> 00:07:34,320
So we just allow the inputs to pass without any modifications.

91
00:07:35,040 --> 00:07:43,710
And you could also see clearly from here that the layer that trainable doesn't really apply because

92
00:07:44,250 --> 00:07:52,890
drop out doesn't have trainable parameters, whereas will, with this training, we could decide whether

93
00:07:52,890 --> 00:07:54,360
it's true or false.

94
00:07:54,360 --> 00:07:58,800
That is what training mode or inference mode.

95
00:08:01,350 --> 00:08:08,430
So in fact, what we're saying is we have this model here, we have the backbone and we have the head

96
00:08:08,430 --> 00:08:10,080
for classification.

97
00:08:10,410 --> 00:08:15,330
We apply transfer learning by freezing all this, we freeze all our backbone.

98
00:08:15,330 --> 00:08:20,910
So no, no parameter here is updated during training.

99
00:08:20,910 --> 00:08:28,410
And then now we move on to fine tuning where in fine tuning, we want to update these parameters with

100
00:08:28,410 --> 00:08:30,090
a very small learning rate.

101
00:08:30,090 --> 00:08:39,060
And then we also want to avoid a situation where those mean invariant statistics which we have gotten

102
00:08:39,060 --> 00:08:45,900
during the training process will be upset at during this fine tuning process.

103
00:08:45,900 --> 00:08:54,330
And so the batch norm is kind of like a special layer where even during the fine tuning where we want

104
00:08:54,330 --> 00:09:00,000
or where we have set the trainable to, to true, that is we want to update these weights during the

105
00:09:00,000 --> 00:09:00,750
training.

106
00:09:00,750 --> 00:09:11,400
We do not want to modify or upset the batch norms mean and variance.

107
00:09:11,400 --> 00:09:16,500
And so we are going to set this training year to false.

108
00:09:17,370 --> 00:09:20,820
So it still behaves as if it were in inference mode.

109
00:09:22,170 --> 00:09:28,500
So getting back here, we have our training which has been set to false and then we could start training

110
00:09:28,500 --> 00:09:29,700
our model again.

111
00:09:29,730 --> 00:09:39,270
But one point to note here is we do the fine tuning on a pre trained model which has already been trained.

112
00:09:39,270 --> 00:09:45,990
So what we do is we are now going to start training this model this way will start by training the pre

113
00:09:45,990 --> 00:09:46,740
trained model.

114
00:09:46,740 --> 00:09:53,070
So we start by having this backbone to be set to false and this training set to false.

115
00:09:53,610 --> 00:10:00,180
So we're going to repeat the transfer learning process again before then applying fine tuning.

116
00:10:00,180 --> 00:10:05,580
So each time you want to apply a fine tuning, make sure you have this set of false training set to

117
00:10:05,580 --> 00:10:08,070
false and then you go ahead.

118
00:10:08,070 --> 00:10:10,110
So yeah, let's run this again.

119
00:10:10,620 --> 00:10:18,770
Here's our fine tuned model which achieves a best validation accuracy of about 70%.

120
00:10:18,780 --> 00:10:21,150
Okay, now we're done with transfer learning.

121
00:10:21,150 --> 00:10:24,810
We're now going to apply fine tuning to do this.

122
00:10:24,810 --> 00:10:26,640
We're going to set this to true.

123
00:10:26,640 --> 00:10:28,860
So all we need to do here is set this to true.

124
00:10:28,890 --> 00:10:31,080
We're not going to run this again.

125
00:10:31,080 --> 00:10:33,090
We use the same model.

126
00:10:33,090 --> 00:10:34,290
And that's even the idea.

127
00:10:34,290 --> 00:10:42,240
The idea is for us to start with the backbone, which is not trainable and then later on make it trainable,

128
00:10:42,570 --> 00:10:48,270
or while now just simply recompile the model.

129
00:10:48,270 --> 00:10:54,690
So don't forget to recompile this model to take into consideration the fact that some parts of the model

130
00:10:54,690 --> 00:10:55,950
are now trainable.

131
00:10:55,950 --> 00:11:01,470
So let's get back here, recompile the model and see what this gives us.

132
00:11:02,580 --> 00:11:11,490
Now, as we start training, we notice that this validation accuracy here isn't looking like what we

133
00:11:11,490 --> 00:11:18,090
expect, because before getting to the fine tuning, we already had a model with a validation accuracy

134
00:11:18,090 --> 00:11:19,950
of about 70%.

135
00:11:19,950 --> 00:11:22,140
But now we get in this 33%.

136
00:11:22,140 --> 00:11:28,230
And the simple reason why this is so is because our learning rate here, we still maintain the same

137
00:11:28,230 --> 00:11:31,950
learning rate instead of reducing it before the fine tuning.

138
00:11:31,950 --> 00:11:35,460
So we'll have to stop this here.

139
00:11:35,460 --> 00:11:41,880
Then we'll get back to this year, set this to to false.

140
00:11:42,480 --> 00:11:47,460
So we'd have to start back the whole process set this to false.

141
00:11:47,910 --> 00:11:51,510
And then we retrain the model.

142
00:11:51,510 --> 00:11:55,500
We get this accuracy of about 70%.

143
00:11:55,530 --> 00:12:00,420
Now we get back here and we set this to true.

144
00:12:00,630 --> 00:12:02,460
So we set this to true.

145
00:12:02,460 --> 00:12:04,050
So that's fine.

146
00:12:04,380 --> 00:12:07,410
We run the cell back, but now trainable.

147
00:12:07,410 --> 00:12:14,070
And then we are going to make sure this learning rate here, we divide it by 100.

148
00:12:14,070 --> 00:12:17,400
So we're going to make use of very small learning rate.

149
00:12:17,430 --> 00:12:21,270
Now, once we have this, we're going to run this again.

150
00:12:21,270 --> 00:12:26,790
So we're going to compile our model and restart the training process train are completed.

151
00:12:26,790 --> 00:12:36,510
The other results we get, you could see that the validation accuracy increases up to 72.2%.

152
00:12:36,510 --> 00:12:44,820
So we make an extra gain of 2% for the validation accuracy after fine tuning our model.

153
00:12:44,820 --> 00:12:53,640
And this makes sense since fine tuning permits us to squeeze out some extra juice from this backbone.

154
00:12:53,640 --> 00:12:57,480
Since this time around, it's actually trainable.

155
00:12:58,050 --> 00:13:00,600
And that said, we've just completed a section on trans.

156
00:13:00,950 --> 00:13:01,580
Learning.

157
00:13:02,630 --> 00:13:05,780
Thanks for getting round up to this point and see you on the next section.