1
00:00:00,180 --> 00:00:05,670
Welcome back in this section, we'll take a look at how we actually Trina YOLO model.

2
00:00:06,240 --> 00:00:07,350
So let's get started.

3
00:00:07,500 --> 00:00:14,130
So firstly, in all training processes, you have the training data, which includes the ground truth

4
00:00:14,130 --> 00:00:16,330
labels as well as a test data.

5
00:00:16,350 --> 00:00:20,090
But for now, we'll be focusing on how we use the training data in your.

6
00:00:20,640 --> 00:00:24,990
So initially, we have a human who annotate these these images here.

7
00:00:25,410 --> 00:00:29,640
So we have a ground truth green box here with the class being dug.

8
00:00:30,240 --> 00:00:37,440
So during training, we have to be basically tried to get the model to basically learn a function that

9
00:00:37,440 --> 00:00:41,940
can propose a box that is as close to this box as possible with the precise class.

10
00:00:42,930 --> 00:00:45,180
So let's take a look what happens during training?

11
00:00:45,720 --> 00:00:50,160
So during training, the model attempts to match the example to the right.

12
00:00:50,160 --> 00:00:50,550
So.

13
00:00:50,760 --> 00:00:57,450
So initially, we have the mapping being here and we have the plus class probability being like this

14
00:00:57,450 --> 00:00:57,670
here.

15
00:00:57,690 --> 00:01:00,180
So one being taught and the other classes being zero.

16
00:01:01,290 --> 00:01:05,070
Next, we get two bounding box predictions for that.

17
00:01:05,070 --> 00:01:05,430
So.

18
00:01:06,120 --> 00:01:09,480
So we get these two boxes here, as you can see in this example.

19
00:01:10,080 --> 00:01:14,130
However, we need to adjust these bounding boxes because they both can't be right.

20
00:01:14,670 --> 00:01:19,410
So you can see that is one the larger box is closer to the ground truth box here.

21
00:01:19,860 --> 00:01:25,500
So what we do, we increase the confidence of that box and then decrease the confidence of this box

22
00:01:25,500 --> 00:01:26,430
of smaller box.

23
00:01:26,850 --> 00:01:30,180
So by doing that, you can see we just made this larger here.

24
00:01:30,450 --> 00:01:33,480
Sorry, we made the box larger here on this one.

25
00:01:34,110 --> 00:01:39,120
And then simultaneously, we make that the middle box smaller, so we lower the confidence.

26
00:01:39,120 --> 00:01:40,980
So this is what we get here.

27
00:01:41,670 --> 00:01:48,300
So no cells with no ground to it because you will have cells that predict that basically bounding boxes

28
00:01:48,300 --> 00:01:49,000
over nothing.

29
00:01:49,030 --> 00:01:54,930
The background we call it those basically, we decrease the confidence of those and we don't adjust

30
00:01:54,960 --> 00:01:57,690
class probabilities or coordinates of these boxes.

31
00:01:58,560 --> 00:02:02,790
So that's essentially how Eula tries to fit these boxes to the data.

32
00:02:03,750 --> 00:02:05,790
But how do we how does it actually work?

33
00:02:06,270 --> 00:02:09,450
Well, we need a combination of three lost functions to do this.

34
00:02:09,990 --> 00:02:11,930
Firstly, we need a classification loss.

35
00:02:11,940 --> 00:02:18,150
So if an object is detected, it is a squared error loss of the class conditional probabilities for

36
00:02:18,150 --> 00:02:19,050
each class.

37
00:02:19,710 --> 00:02:25,350
Then we have the localization of loss, which measures the performance for the predicted both bounding

38
00:02:25,350 --> 00:02:27,270
box to the ground it.

39
00:02:27,900 --> 00:02:29,850
And then we have the confidence loss.

40
00:02:30,240 --> 00:02:33,150
That's the confidence that the box has an object.

41
00:02:33,750 --> 00:02:40,010
So that's basically a short summary of how the training process in Yuma works, and you can see just

42
00:02:40,020 --> 00:02:42,000
a summary slide after you lose performance.

43
00:02:42,000 --> 00:02:44,190
This is the a little bit and treat, to be precise.

44
00:02:44,760 --> 00:02:51,210
You can see how much faster it was 45 frames per second compared to the same 50 yards at the time,

45
00:02:51,210 --> 00:02:56,790
which was faster RC and just getting a better, much better map score by at least 10 points.

46
00:02:57,180 --> 00:03:04,140
However, it was a lot slower than usual, and you can see you're going does generalize and generalize

47
00:03:04,140 --> 00:03:10,740
quite well, even on other paintings like this than defies bottles dining table Pearson's cat?

48
00:03:11,310 --> 00:03:12,150
So that's quite cool.

49
00:03:12,150 --> 00:03:15,360
And by the way, the original Mona Lisa painting did not have a cut.

50
00:03:15,450 --> 00:03:21,690
In case you were wondering also, you know, compared to past our CNN's, you can see that actually

51
00:03:21,690 --> 00:03:26,340
fast approaching ends, at least compared to your vision tree did perform better.

52
00:03:26,490 --> 00:03:29,400
You can see 71 percent correct as opposed to 65.

53
00:03:30,000 --> 00:03:31,920
However, Eula was much faster.

54
00:03:32,010 --> 00:03:39,150
So the key takeaways from this your lesson is that you lose fast and you lose vision for Vision five

55
00:03:39,150 --> 00:03:45,330
and except perhaps the best in accuracy right now in 2001, still surpassing other models which include

56
00:03:45,660 --> 00:03:48,960
our CNN's detector on to which we'll talk about shortly.

57
00:03:49,320 --> 00:03:54,330
And if we shouldn't detect the other provides end to end training, which we've seen there, and it

58
00:03:54,330 --> 00:03:59,940
gives us very little background error and even doing the illusion treat him up wasn't as good as our

59
00:03:59,940 --> 00:04:00,540
CNN's.

60
00:04:00,900 --> 00:04:02,130
It was a lot faster.

61
00:04:02,550 --> 00:04:06,240
When you look does tend to have more localization errors at times.

62
00:04:06,240 --> 00:04:11,430
However, in the little regions of your world, which is you mentioned four or five in X, they actually

63
00:04:11,430 --> 00:04:15,660
have minimized that quite a bit, so it will stop them from now.

64
00:04:16,170 --> 00:04:23,490
And in the next section, we'll take a look at the architecture and evolution from region three to version

65
00:04:23,490 --> 00:04:23,880
five.

66
00:04:24,570 --> 00:04:25,620
So stay tuned for that.

67
00:04:25,740 --> 00:04:26,520
Thank you for watching.