1
00:00:00,300 --> 00:00:06,870
Hello, everyone, and welcome to this new and exciting session in which we are going to build our own

2
00:00:06,870 --> 00:00:08,820
YOLO like model.

3
00:00:08,850 --> 00:00:17,250
So from the paper we had already seen that we have this initial conv layers which are pre-trained on

4
00:00:17,250 --> 00:00:27,210
the image net data set such that they could be used to extract very useful features from our input images.

5
00:00:27,780 --> 00:00:38,010
And then this conv layers were followed by two fully connected layers which were designed in order to

6
00:00:38,010 --> 00:00:40,920
adapt to our problem of object detection.

7
00:00:41,340 --> 00:00:49,350
Now, given that we do not want to train this backbone year from scratch on the image net data set,

8
00:00:49,400 --> 00:00:54,210
we'll use an already pre-trained backbone, which is the rest net 50.

9
00:00:54,630 --> 00:01:03,130
Again, we have the output dimension defined as a number of classes plus five times B from the paper.

10
00:01:03,150 --> 00:01:04,500
B is given to B two.

11
00:01:04,530 --> 00:01:11,310
We've seen this already and then five because we have the probability of obtaining an object.

12
00:01:11,310 --> 00:01:16,440
And then for the remaining four we have the bound and box.

13
00:01:16,890 --> 00:01:22,370
So we have two of this bounding box predictions.

14
00:01:22,380 --> 00:01:29,100
That's why here we have two times 510 plus the number of classes which in our case is equal 20.

15
00:01:29,550 --> 00:01:32,930
Now we define this number of filters to be 512.

16
00:01:32,940 --> 00:01:34,440
So that's it.

17
00:01:34,440 --> 00:01:38,700
From year we have our full, complete model.

18
00:01:38,730 --> 00:01:40,620
You could take this off.

19
00:01:40,620 --> 00:01:47,580
You see that we have our pre trained race net, which is our backbone or our base model.

20
00:01:47,580 --> 00:01:54,090
And then we followed this up with several conflicts similar to what we have here in the paper.

21
00:01:54,090 --> 00:02:01,410
And then we have the global average pulling, which is what is given here in the paper.

22
00:02:01,680 --> 00:02:09,060
Now, one thing we should note about the average pulling is the fact that when we have inputs, let's

23
00:02:09,060 --> 00:02:23,340
say we have this seven by seven, then by let's say five, so we have one channel to three, four and

24
00:02:23,340 --> 00:02:23,970
then five.

25
00:02:23,970 --> 00:02:26,940
So let's suppose we have some of my seven, my five input.

26
00:02:26,970 --> 00:02:33,810
Now after going into the global average pulling, what we will have here is the averaging of each and

27
00:02:33,810 --> 00:02:39,870
every value or let's say pixel in each and every channel.

28
00:02:39,870 --> 00:02:46,440
So for this channel, for example, we will have one representative value for this channel will have

29
00:02:46,440 --> 00:02:48,440
one representative value, which is the average.

30
00:02:48,450 --> 00:02:53,700
So for this we would have the average, this would have the average and this will have the average.

31
00:02:53,700 --> 00:02:56,790
So we average all these values here.

32
00:02:57,870 --> 00:03:04,200
And the problem with this is information about object position is lost.

33
00:03:04,200 --> 00:03:10,770
And so instead of using this average pulling is preferable to use the flattening.

34
00:03:11,400 --> 00:03:17,770
And so what we do is we'll take this off from here and then we'll have flatten.

35
00:03:18,090 --> 00:03:18,720
Okay.

36
00:03:18,720 --> 00:03:26,190
So once we have that, just as with the paper you see here, we have the fully connected layer, which

37
00:03:26,190 --> 00:03:30,540
is this dense layer right here and then this other fully connected layer.

38
00:03:30,540 --> 00:03:33,090
So we have that and then we reshape.

39
00:03:33,900 --> 00:03:36,080
Now, this should be actually split.

40
00:03:36,090 --> 00:03:40,620
Let's take all this year copy and paste.

41
00:03:40,620 --> 00:03:48,750
So we're split by split by split or split by split by output dimension.

42
00:03:48,750 --> 00:03:50,550
So it's seven by seven by 30.

43
00:03:50,550 --> 00:03:53,220
So this is now our model.

44
00:03:53,760 --> 00:04:00,240
You can see we have a total of 53 million parameters, 30 trainable and 23 non trainable.

45
00:04:00,240 --> 00:04:02,950
That's from our rest net.

46
00:04:03,090 --> 00:04:09,730
We are then going to define our model checkpoint where our file path is this year.

47
00:04:09,750 --> 00:04:13,830
Then we're going to save only the weights, we're going to monitor the validation loss.

48
00:04:14,580 --> 00:04:22,170
We're going to obviously save the model which produces the minimum or the smallest validation loss,

49
00:04:22,170 --> 00:04:23,190
and that's it.

50
00:04:23,190 --> 00:04:26,010
We save the base, we save the best weights only.

51
00:04:26,010 --> 00:04:33,750
So we run that and then now we move to the scheduling here if the number of epochs is less than 40.

52
00:04:33,750 --> 00:04:40,080
So the first 40 epochs, we use a learning rate of one times ten to a negative three between 40 and

53
00:04:40,080 --> 00:04:40,470
80.

54
00:04:40,470 --> 00:04:45,630
We use a learning rate of five times 10 to -4, and then after that we use a learning rate of one times

55
00:04:45,630 --> 00:04:46,950
ten to the negative four.

56
00:04:46,950 --> 00:04:48,030
So that's it.

57
00:04:48,270 --> 00:04:53,190
We compile our model and then we start with the training.

58
00:04:53,970 --> 00:04:58,770
Now, after training for a few epochs, you will notice that the model starts to over fit.

59
00:04:59,020 --> 00:05:06,370
So in the next section, we are going to treat or use several techniques to help solve this or resolve

60
00:05:06,370 --> 00:05:08,080
the problem of overfitting.

61
00:05:08,380 --> 00:05:15,340
Now, we've been training for over 20 epochs and you could see clearly from the loss and the validation

62
00:05:15,340 --> 00:05:21,850
of the train loss and the validation loss that our model starts performing well and at some point starts

63
00:05:21,850 --> 00:05:22,750
overfitting.

64
00:05:22,780 --> 00:05:29,590
As you could see here, we have the training loss, which keeps dropping right here, and then the validation

65
00:05:29,590 --> 00:05:34,160
loss drops and then at some point starts increasing.

66
00:05:34,180 --> 00:05:37,480
So clearly our model is overfitting.
