1
00:00:00,150 --> 00:00:03,990
Hi and welcome to our lecture on mean average precision.

2
00:00:04,440 --> 00:00:10,140
This is one of the most important metrics that we use to evaluate object detector model performance.

3
00:00:10,680 --> 00:00:11,940
So let's get started.

4
00:00:12,150 --> 00:00:18,810
So firstly, in summary, mean average precision is perhaps the best metric to look at when comparing

5
00:00:18,810 --> 00:00:20,400
models on the same dataset.

6
00:00:20,880 --> 00:00:25,000
That's why it's it's become the metric that we use in competition, Steve.

7
00:00:25,020 --> 00:00:30,300
Other medium models are competing against each other, and it's actually the aggregation of several

8
00:00:30,300 --> 00:00:32,100
metrics which we'll talk about now.

9
00:00:32,580 --> 00:00:37,320
So firstly, just to recap or record no pun intended.

10
00:00:37,980 --> 00:00:40,470
Let's take a look at what precision and recall were.

11
00:00:41,010 --> 00:00:45,240
So precision is essentially when your model predicts positive.

12
00:00:45,630 --> 00:00:46,860
How often is it right?

13
00:00:46,890 --> 00:00:47,790
So what does that mean?

14
00:00:48,330 --> 00:00:54,630
That means, let's say we have a model that's detecting dogs, and every time it predicts a dog, how

15
00:00:54,630 --> 00:00:55,710
often is that right?

16
00:00:55,890 --> 00:01:00,900
That's basically the precision, the accuracy of that, because sometimes it can be predicting a horse

17
00:01:00,900 --> 00:01:02,310
or a cat or dogs.

18
00:01:02,730 --> 00:01:03,890
And therefore, it's wrong.

19
00:01:03,900 --> 00:01:09,030
So that's why it's the true positives over the true positives plus two false positives.

20
00:01:09,490 --> 00:01:11,700
OK, now what about recall?

21
00:01:11,940 --> 00:01:14,160
Recall is another very important metric.

22
00:01:14,610 --> 00:01:20,610
It basically tells us how well our model is performing at finding all the classes it says it's supposed

23
00:01:20,610 --> 00:01:21,030
to find.

24
00:01:21,300 --> 00:01:25,500
So finding all the positives essentially in a more simplified way of thinking about it.

25
00:01:26,040 --> 00:01:27,060
So what does that mean?

26
00:01:27,990 --> 00:01:33,300
That essentially means if we have an object detector that is looking for dogs, we want to know how

27
00:01:33,300 --> 00:01:40,470
much times of missed dogs overall so that we can develop a score to know how well it's recalling dogs

28
00:01:40,470 --> 00:01:41,400
in that dataset.

29
00:01:41,820 --> 00:01:44,730
So how well a model is at finding dogs, essentially.

30
00:01:45,510 --> 00:01:51,240
So now let's take a look at how we use precision and recall in the object detector world.

31
00:01:51,510 --> 00:01:57,410
So as I said with this example, let's say we have an object detector that is detecting dogs here,

32
00:01:57,960 --> 00:02:01,800
and our test dataset is comprised of just these tree images.

33
00:02:02,310 --> 00:02:04,830
You can see here that this is clearly not a dog.

34
00:02:04,920 --> 00:02:05,550
This is a cat.

35
00:02:06,090 --> 00:02:07,140
So what does that mean?

36
00:02:07,680 --> 00:02:13,170
It means our model predicts a dogs three times, but only two times the model is correct.

37
00:02:13,620 --> 00:02:19,020
So that means our precision is to of a tree or two thirds or sixty six point six percent.

38
00:02:19,560 --> 00:02:23,430
Not that great, but it depends on, you know, how hard your task is.

39
00:02:24,140 --> 00:02:30,000
Now let's take a look at recall, and the object that I talked to will recall is how good of a model

40
00:02:30,000 --> 00:02:32,220
is at finding for detecting an object.

41
00:02:32,430 --> 00:02:39,150
So namely, we have this new test dataset with these three images here, and we can see that it missed

42
00:02:39,300 --> 00:02:40,650
this dog, unfortunately.

43
00:02:41,130 --> 00:02:45,810
So that means that we we predicted two dogs, but missed one out of the tree.

44
00:02:46,470 --> 00:02:50,640
So I will recall again, it's not two of a trio of 66 percent.

45
00:02:51,570 --> 00:02:56,460
So let's think about something, though what if we changed our I.O.U. threshold?

46
00:02:56,970 --> 00:02:58,650
Remember what are you?

47
00:02:58,660 --> 00:03:03,930
Threshold tells us basically how much overlap in the boxes we we we use.

48
00:03:04,380 --> 00:03:08,520
So in this case, this is like the minimum of overlap that it will be.

49
00:03:08,520 --> 00:03:09,710
This is 50 percent.

50
00:03:09,720 --> 00:03:10,170
Let's see.

51
00:03:10,750 --> 00:03:13,610
So it's overlapping with 50 percent here.

52
00:03:13,620 --> 00:03:14,370
That's the union.

53
00:03:15,000 --> 00:03:20,700
And what's happening now is that if this box were a little bit bigger, it would drop out entirely.

54
00:03:21,600 --> 00:03:24,660
There would be no bounding box at that point because it's been lowered a threshold.

55
00:03:25,350 --> 00:03:29,140
So as you can see, if we changed our view threshold, let's say we widen.

56
00:03:29,490 --> 00:03:32,580
So let's say this box was a bit wider than it actually should have been.

57
00:03:33,240 --> 00:03:38,100
So we allow more flexibility and more leeway with our model by reducing the IAU threshold.

58
00:03:38,640 --> 00:03:46,470
We can now realize that by making it less strict means no that we can get different precision and recall

59
00:03:46,470 --> 00:03:46,980
values.

60
00:03:47,520 --> 00:03:50,280
So you can see the stricter we make you threshold.

61
00:03:50,280 --> 00:03:51,690
Like, let's say it was 90 percent.

62
00:03:52,230 --> 00:03:56,460
It wouldn't show this monkey box anymore, but it still would show these two.

63
00:03:57,090 --> 00:04:01,920
So that's basically how the I.O.U. thresholds affect our precision and recall.

64
00:04:02,880 --> 00:04:05,640
No, let's think or talk about map.

65
00:04:06,270 --> 00:04:08,850
So Map stands for mean average precision.

66
00:04:09,270 --> 00:04:16,920
So firstly, average precision is basically finding the area of a under precision recall curve with

67
00:04:16,920 --> 00:04:18,990
varying you confidence thresholds.

68
00:04:19,110 --> 00:04:20,490
So you can see an example here.

69
00:04:20,490 --> 00:04:24,420
Let's think about one class as we could the coffee more class as the pink line here.

70
00:04:25,320 --> 00:04:29,280
So you can see this is the precision here, how much opposition is.

71
00:04:29,790 --> 00:04:36,780
And this is a wrinkle here, and you can see if we set an IQ threshold of, let's say, zero at that

72
00:04:36,780 --> 00:04:37,090
point.

73
00:04:37,210 --> 00:04:39,660
No, this is this access is not you.

74
00:04:39,660 --> 00:04:41,220
By the way, this access is just precision.

75
00:04:41,220 --> 00:04:47,850
And recall this, for instance, I'm just saying if we set an IQ threshold of zero is probably not a

76
00:04:47,850 --> 00:04:51,330
good one because you're going to get all boxes at that point.

77
00:04:51,690 --> 00:04:59,130
But that means that you'll get high precision score and very low record score.

78
00:04:59,190 --> 00:04:59,640
That's why it's.

79
00:05:00,170 --> 00:05:00,620
Zero.

80
00:05:01,160 --> 00:05:04,580
So that's what's happening at that point, or it could be the reverse.

81
00:05:04,770 --> 00:05:05,810
You have to think about it a bit.

82
00:05:06,230 --> 00:05:12,440
But either way, what's happening, though this graph is that we're actually plotting the recall and

83
00:05:12,440 --> 00:05:16,310
precision for all the different EU thresholds from zero to one.

84
00:05:16,760 --> 00:05:18,170
And we get this curve here.

85
00:05:18,200 --> 00:05:21,830
So this curve is called an average precision curve.

86
00:05:22,280 --> 00:05:24,140
So what is the mean average precision?

87
00:05:24,170 --> 00:05:26,700
Well, the mean average position is basically this blue line.

88
00:05:26,700 --> 00:05:30,890
Now it's all of them averaged out as well here, so you can see the formula.

89
00:05:31,220 --> 00:05:37,760
So we just get the average position for each class and we just basically find it to mean average precision

90
00:05:37,910 --> 00:05:38,480
out of it.

91
00:05:39,140 --> 00:05:44,720
It's essentially that simple, although this is a very confusing metric that a lot of people don't fully

92
00:05:44,720 --> 00:05:45,260
understand.

93
00:05:45,270 --> 00:05:51,710
So it's important that you do understand what's happening here and why is this such a good, good metric

94
00:05:51,710 --> 00:05:53,240
or measure to evaluate this?

95
00:05:53,870 --> 00:05:57,320
Well, you can see it considers a number of things it considers.

96
00:05:57,320 --> 00:06:03,920
Firstly, the classifier accuracy how good we are detecting objects, the correct objects, being at

97
00:06:04,100 --> 00:06:10,130
the correct class, you know, object detection model, and it also considers the bounding box locations.

98
00:06:10,520 --> 00:06:16,940
So it's an aggregation of different metrics that evaluate classification and the bounding box accuracy.

99
00:06:17,330 --> 00:06:22,730
So that's why it works so well as a benchmark tool metric for other sectors.

100
00:06:23,150 --> 00:06:28,990
And it's what's used commonly in the KushCo dataset that sense for the common objects and context dataset.

101
00:06:29,000 --> 00:06:32,690
This is basically the image in that for the object detection world.

102
00:06:33,020 --> 00:06:38,330
It's a very vast dataset that actually it's constantly improving, but we have different visions that

103
00:06:38,330 --> 00:06:39,860
we test on for the competitions.

104
00:06:40,400 --> 00:06:47,300
And you can see right now it's as image screenshot captured from pictures from people with code sorry,

105
00:06:47,310 --> 00:06:52,430
and you can see these are the best performance right now using different APIs or APIs.

106
00:06:52,430 --> 00:06:53,660
Basically, the sense a map.

107
00:06:53,840 --> 00:06:58,700
Essentially, it's not just the average precision, average precision for world classes, and you can

108
00:06:58,700 --> 00:07:04,570
see they have different metrics as it is 50 75 AP Small, medium, large.

109
00:07:05,480 --> 00:07:06,110
What did they use?

110
00:07:06,110 --> 00:07:11,030
Extra training data so you can see these are these are essentially the best performing up to detecting

111
00:07:11,150 --> 00:07:12,890
networks right now in research.

112
00:07:13,490 --> 00:07:18,740
However, I would see a lot of these networks onto practical, and that's why YOLO, which is a very

113
00:07:18,740 --> 00:07:26,510
easy to train, practical, easy to deploy model, has taken over the industry by storm for the last

114
00:07:26,510 --> 00:07:31,730
three or four years, or pretty much all objects that the models used in industrial YOLO, although

115
00:07:31,730 --> 00:07:36,830
efficient, detect and detection to gaining in popularity as well.

116
00:07:37,430 --> 00:07:40,350
So that's it for this lecture on lap.

117
00:07:40,790 --> 00:07:46,790
What we'll do next, we'll take a look at non maximum suppression, which is a technique we use to clean

118
00:07:46,790 --> 00:07:51,110
up our bounding box proposals in optic tech, the detectors.

119
00:07:51,590 --> 00:07:53,320
So I'll see you there shortly.

120
00:07:53,330 --> 00:07:54,140
Thank you for watching.