1
00:00:00,270 --> 00:00:07,770
Hi there and welcome to this new session in which we are going to dive into testing out our YOLO model,

2
00:00:07,770 --> 00:00:09,930
which we trained in the previous section.

3
00:00:10,170 --> 00:00:15,180
And to carry out this testing, we are going to make use of the cocoa data set.

4
00:00:15,480 --> 00:00:22,620
And so now we'll pick a random some images from this dataset and test them out with our model.

5
00:00:22,950 --> 00:00:25,770
Now the first thing we'll do is load the model.

6
00:00:25,770 --> 00:00:26,910
So there we go.

7
00:00:26,910 --> 00:00:33,390
We load the model, we're going to create this outputs directory and then we'll specify the path to

8
00:00:33,390 --> 00:00:34,930
our test images.

9
00:00:34,950 --> 00:00:38,400
Now let's dive into this test method.

10
00:00:39,660 --> 00:00:46,980
Here we're going to take the file path as a test path, and then we have this image on which we are

11
00:00:46,980 --> 00:00:50,490
going to put the band and boxes and the classes.

12
00:00:50,490 --> 00:00:57,660
Then, given that we are not or we did not use OpenCV to load the image previously, we are going to

13
00:00:57,660 --> 00:01:02,010
go with the exact same process we had already.

14
00:01:02,010 --> 00:01:05,490
That is, we read the file, we decode and then we resize.

15
00:01:05,490 --> 00:01:11,050
So this is what we had and once we have this image, we pass this into our model.

16
00:01:11,070 --> 00:01:14,370
Now the output of our model will be something like this.

17
00:01:14,370 --> 00:01:22,680
So we'll have this seven by seven by 30 tensor.

18
00:01:25,290 --> 00:01:30,240
Now, remember that for a cell like this one, whether it's no object, we suppose that this person

19
00:01:30,240 --> 00:01:31,350
here is an object.

20
00:01:31,590 --> 00:01:36,870
So like this one where there's no object, we have a zero for the first position and then for the next

21
00:01:36,870 --> 00:01:37,380
position.

22
00:01:37,380 --> 00:01:46,950
We'll have this for bounding box positions or bounding box values, and then we'll have another zero,

23
00:01:46,950 --> 00:01:53,050
and then we'll have four and then we'll have now the 20 values for the class.

24
00:01:53,070 --> 00:01:57,060
Now, given that there is no object here, all this wouldn't really matter.

25
00:01:57,060 --> 00:02:10,590
And so that's why we are only going to take the boxes where this two values, this value and the value

26
00:02:10,620 --> 00:02:17,040
is going to be greater than or equal a certain threshold which will define to be 0.25.

27
00:02:17,730 --> 00:02:21,920
So that set is so like this one, which is the center of our object.

28
00:02:21,930 --> 00:02:25,980
Remember, we have this object here and we have this center.

29
00:02:25,980 --> 00:02:34,280
So you would have this cellular within the center, meaning that if we take this, let's take its values.

30
00:02:34,290 --> 00:02:43,980
If you take this, you would have a let's than 0.75 and then you have four values for representing this

31
00:02:43,980 --> 00:02:46,470
position or its bounding box.

32
00:02:46,470 --> 00:02:53,940
And then we'll have maybe another say, 0.9.

33
00:02:53,940 --> 00:03:00,870
Then we have four values and then we have different values here for the class.

34
00:03:01,500 --> 00:03:09,690
And so the idea here is to get all these different positions where we have values greater than 0.25.

35
00:03:10,410 --> 00:03:17,490
Now, you should know that we could pick a threshold of, say, 0.5 or 0.7 or 0.2 as we've done.

36
00:03:18,270 --> 00:03:24,420
And it really depends on how this threshold affects the model performance.

37
00:03:24,420 --> 00:03:31,740
So we pick the 0.25 because it performs better than picking 0.5 as we 0.5.

38
00:03:32,100 --> 00:03:34,460
Many objects were missed out.

39
00:03:34,470 --> 00:03:35,810
So that is it.

40
00:03:35,820 --> 00:03:37,080
We move to the next.

41
00:03:37,080 --> 00:03:40,920
We simply just gather all this different outputs.

42
00:03:40,920 --> 00:03:44,430
That is, we have the object positions from here.

43
00:03:44,430 --> 00:03:52,680
And then to obtain this different outputs here, we'll take the output itself, which is all this,

44
00:03:52,680 --> 00:03:57,780
and then based off the positions, we'll get this output.

45
00:03:59,070 --> 00:04:09,270
So if we do here, if from here we do, we print out the object positions, objects, positions, and

46
00:04:09,270 --> 00:04:15,540
then below we print out the selected output.

47
00:04:15,870 --> 00:04:19,230
That's here we have this exception.

48
00:04:19,230 --> 00:04:21,510
Okay, so let's run this.

49
00:04:22,530 --> 00:04:28,230
You find that for this image, for example, we have this positions.

50
00:04:28,230 --> 00:04:31,350
That is, let's get back to this.

51
00:04:31,350 --> 00:04:36,120
So it's telling us that this is the position for three.

52
00:04:36,120 --> 00:04:37,320
We have an object.

53
00:04:37,320 --> 00:04:42,930
So we go zero one, two, three, four, and then zero one, two, three.

54
00:04:42,930 --> 00:04:47,610
So an object is found here for this image we have.

55
00:04:48,090 --> 00:04:54,960
Now, the reason why we're having this duplicate is simply because it happens that for this first position,

56
00:04:54,960 --> 00:04:57,930
that's for this first core, there is an object.

57
00:04:57,930 --> 00:05:00,420
And for this other score there's also an object.

58
00:05:00,420 --> 00:05:08,520
You could see that from your that we actually compared the zero position and this fit position, which

59
00:05:08,520 --> 00:05:12,060
is this 0.75 and this 0.9 respectively.

60
00:05:12,450 --> 00:05:16,410
And now we could take a closer look at this selected output.

61
00:05:16,440 --> 00:05:20,010
You see that this is 0.96.

62
00:05:20,010 --> 00:05:23,070
So here is 0.96.

63
00:05:23,520 --> 00:05:24,750
Well, it's for this position.

64
00:05:24,750 --> 00:05:27,080
So we could we could just take all this off.

65
00:05:27,090 --> 00:05:33,240
So if we take all this off, all this here of this is dirty.

66
00:05:34,740 --> 00:05:42,030
You find that this which is for three, the object for this specific image is for force at this sell

67
00:05:42,030 --> 00:05:42,900
for three.

68
00:05:42,900 --> 00:05:52,170
And the outputs this year and you see the first position, 0.96, then we have the four for the bounding

69
00:05:52,170 --> 00:05:55,800
box and then the next 0.98.

70
00:05:55,800 --> 00:05:59,580
So it shows clearly that the model is shorter, there's an object there.

71
00:05:59,580 --> 00:06:06,780
And then from there we have this four again and then now we followed with this 20 different classes.

72
00:06:06,780 --> 00:06:15,210
Now, just by looking at this, just by looking at this here, 0.30.3, we can look for the one with

73
00:06:15,210 --> 00:06:15,810
the highest value.

74
00:06:15,810 --> 00:06:16,260
Okay.

75
00:06:16,350 --> 00:06:19,920
It shows clearly that this is the class with the highest value.

76
00:06:21,060 --> 00:06:24,000
And so from this we know that there is an object at.

77
00:06:24,170 --> 00:06:30,170
This position and that object belongs to this class.

78
00:06:30,170 --> 00:06:33,380
And obviously we have the bound and box surrounding the object.

79
00:06:34,790 --> 00:06:42,560
So now that we've had this different values right here, the next thing to do will be to convert this

80
00:06:42,560 --> 00:06:44,210
bond and boxes.

81
00:06:44,210 --> 00:06:57,860
That's this into the X mean y, mean x max y max format, which we are then going to use open c v to

82
00:06:59,090 --> 00:07:02,930
draw this bounding boxes on the image.

83
00:07:03,380 --> 00:07:08,030
Now we're going to go through each and every object position which we've had already.

84
00:07:08,030 --> 00:07:08,650
From here.

85
00:07:08,660 --> 00:07:13,700
You can see here we have this object positions 043043.

86
00:07:13,700 --> 00:07:18,770
Well, it's a duplicate, so let's focus on just a single or this single one.

87
00:07:18,770 --> 00:07:22,070
So we have 043.

88
00:07:22,070 --> 00:07:25,400
That's essentially the position for three as we've seen already.

89
00:07:25,400 --> 00:07:36,950
And to obtain the output box, which is this value, this value and this value, what we'll do is we'll

90
00:07:36,950 --> 00:07:44,030
get we have the output and then we'll say output position zero position is from the object positions.

91
00:07:44,030 --> 00:07:48,650
Remember, the object position in this case is 043.

92
00:07:49,520 --> 00:07:52,370
So when you say position zero, you're taking zero.

93
00:07:53,150 --> 00:07:55,040
So here you have zero.

94
00:07:55,070 --> 00:08:03,480
Position one is for position two is three, and that's how you select this specific output here.

95
00:08:03,500 --> 00:08:10,640
Now, once you select the specific output, the next selection you want to make is that of the bound

96
00:08:10,640 --> 00:08:11,520
and boxes.

97
00:08:11,540 --> 00:08:16,100
Now when JZ equals zero, here you have zero times five is zero.

98
00:08:16,100 --> 00:08:20,300
So you go from one write up to zero plus five, that's five.

99
00:08:20,300 --> 00:08:23,870
So we go from one up to five, obviously one up to five minus one.

100
00:08:23,870 --> 00:08:25,370
So we have one.

101
00:08:25,580 --> 00:08:31,280
So we see this position one here, position two, position three and then position four.

102
00:08:31,280 --> 00:08:35,650
So that's how we select this year from our output.

103
00:08:35,660 --> 00:08:43,730
And then notice that given that we, we, we have two different bounding box predictions you have this

104
00:08:43,730 --> 00:08:44,270
year.

105
00:08:44,660 --> 00:08:47,540
So for the first one you have 1 to 5.

106
00:08:47,540 --> 00:08:55,490
And for the next time we get into this loop we have since is one we will have one times five, which

107
00:08:55,490 --> 00:08:56,900
is five, five plus one is six.

108
00:08:56,900 --> 00:09:02,360
So we go from 6 to 10, which now is this year.

109
00:09:02,360 --> 00:09:07,070
This is six, seven, eight, nine.

110
00:09:07,070 --> 00:09:10,850
Well, six, seven, eight, nine, six, seven, eight, nine.

111
00:09:11,450 --> 00:09:13,070
We're going from 6 to 10 minus one.

112
00:09:13,070 --> 00:09:13,760
So that's it.

113
00:09:13,910 --> 00:09:17,900
So this is how we obtain the output boxes.

114
00:09:18,470 --> 00:09:21,770
That's how we obtain this here of this bounding boxes.

115
00:09:21,770 --> 00:09:26,810
And then given that, as we said already, we need to convert this into this X mean y, mean x max y

116
00:09:26,840 --> 00:09:27,800
max format.

117
00:09:28,220 --> 00:09:36,050
The first thing we'll do is convert it into the x center y center format or x and so y center width

118
00:09:36,050 --> 00:09:44,810
height format, which is what we do here now to obtain the X center from this year, from this 0.53,

119
00:09:44,810 --> 00:09:45,830
for example.

120
00:09:45,920 --> 00:09:52,760
And so let's suppose, for example, that this 0.53 is at the position of four.

121
00:09:52,760 --> 00:09:55,220
Three does this well, it's four three.

122
00:09:55,220 --> 00:09:59,990
So we go 012340123.

123
00:09:59,990 --> 00:10:06,080
So we have this here around the center or 0.530.17.

124
00:10:06,080 --> 00:10:08,570
So it's our one year we have this.

125
00:10:09,440 --> 00:10:17,140
And the idea is to obtain this value with respect to this full image height and image width.

126
00:10:17,150 --> 00:10:27,440
So first, since first we know that the distance from here to this position here is simply four divided

127
00:10:27,440 --> 00:10:34,430
by seven times 224 times 224.

128
00:10:34,430 --> 00:10:39,560
And that's simply because all this is one, two, three, four, five, six, seven.

129
00:10:39,560 --> 00:10:48,080
So because the full image width is 224, it means getting right up to this position here is four and

130
00:10:48,080 --> 00:10:57,020
seven times 224, which is in fact four times 32, because 224 by seven is 32.

131
00:10:57,470 --> 00:11:00,050
So you take this year.

132
00:11:01,330 --> 00:11:05,800
And multiply by 32 and you get this distance from your right up to this year.

133
00:11:05,950 --> 00:11:16,360
Now, to to to account for the fact that we have this 0.53 year 0.53 will take 0.53 times 10 to 2 because

134
00:11:16,360 --> 00:11:18,600
this full cell is 32.

135
00:11:18,610 --> 00:11:22,780
So 0.5, three times 10 to 2 plus four times 32.

136
00:11:22,810 --> 00:11:35,320
So to obtain this distance, we have four times 32 plus 0.5, three times 32.

137
00:11:36,160 --> 00:11:43,330
Now, for the height, because for the for the Y center, because this is X center, we'll still go

138
00:11:43,330 --> 00:11:44,170
zero one.

139
00:11:44,530 --> 00:11:45,750
This is one, two, three.

140
00:11:45,760 --> 00:11:47,740
So we still have this here.

141
00:11:49,630 --> 00:11:52,200
Again, divide it by seven times 224.

142
00:11:52,210 --> 00:11:55,270
So it's going to be three times two to.

143
00:11:56,350 --> 00:12:01,330
Plus 0.17 times 32.

144
00:12:03,950 --> 00:12:12,680
So that's it to to obtain the why center we have the this position that's three times 10 to 2 plus 0.17

145
00:12:12,680 --> 00:12:15,020
times 32 to find this distance here.

146
00:12:15,020 --> 00:12:17,090
So that's exactly what we do right here.

147
00:12:17,090 --> 00:12:20,300
You will notice we have this post one.

148
00:12:20,300 --> 00:12:22,730
This post one is actually from your this is post.

149
00:12:22,730 --> 00:12:33,620
So post one is four, which is multiplied by 32 because this is this post one plus this output zero

150
00:12:33,620 --> 00:12:36,770
times all of this times 32.

151
00:12:36,770 --> 00:12:46,000
So you have this times 32 plus this output box zero output box zero is 0.530.53 times 32.

152
00:12:46,010 --> 00:12:55,420
So all this is just like saying we want to have four plus 0.53 and then all of this times 32.

153
00:12:55,430 --> 00:12:58,040
So that's what we do here for the Y center is the same.

154
00:12:58,040 --> 00:12:59,360
We have post two.

155
00:12:59,390 --> 00:13:12,860
Now this is a year three times 32, plus this output box one, output box one is this 0.170.17 times

156
00:13:12,860 --> 00:13:13,610
32.

157
00:13:13,850 --> 00:13:19,880
Remember we got output box from year and it coincides with 0.17.

158
00:13:21,740 --> 00:13:22,520
So that's it.

159
00:13:22,520 --> 00:13:29,990
We obtain X enter and we obtain y center and the next thing we want to do is obtain the width and the

160
00:13:29,990 --> 00:13:30,920
height for width.

161
00:13:30,920 --> 00:13:37,040
And the height is going to be easier because when encoding this we simply divide it by the complete

162
00:13:37,040 --> 00:13:38,270
width and the complete height.

163
00:13:38,270 --> 00:13:44,690
So now we'll simply multiply by the height and then multiply by the width to obtain the width and the

164
00:13:44,690 --> 00:13:45,260
height.

165
00:13:45,260 --> 00:13:47,510
So that's how we obtain this from here.

166
00:13:47,510 --> 00:13:54,110
We could now live from X and to y, center x with y width to x mean y, mean x, max y max.

167
00:13:54,290 --> 00:14:00,710
Now, if we have a bounding box like this, let's say we have the center and we know the width and the

168
00:14:00,710 --> 00:14:09,590
height to obtain the x mean we could simply take this center minus half of the width because this distance,

169
00:14:09,590 --> 00:14:15,070
let's say this is the origin, this distance here, here is x center.

170
00:14:15,080 --> 00:14:21,200
If we subtract half of this width, then we'll have this distance which will take us to the X mean,

171
00:14:21,620 --> 00:14:23,600
and then we do the same for the Y.

172
00:14:23,600 --> 00:14:31,970
That is, we take this distance right up to the center and then we subtract the y, the, the Y height

173
00:14:31,970 --> 00:14:33,140
does the height.

174
00:14:33,260 --> 00:14:37,130
We subtract half of the height, not not the height, but half of the height.

175
00:14:37,130 --> 00:14:38,420
Then we'll get to this position.

176
00:14:38,420 --> 00:14:40,190
That's why I mean, that's what we do here.

177
00:14:40,190 --> 00:14:45,920
We have X enter minus half of the width and then we have Y center minus half of the height.

178
00:14:45,920 --> 00:14:49,850
And then for the max we have X center plus half of the width.

179
00:14:49,850 --> 00:14:55,490
So if we want to get this position here, we'll take this plus half of this width, which would take

180
00:14:55,490 --> 00:14:56,900
us to this point here.

181
00:14:56,900 --> 00:15:05,660
And then if we don't have this for the for the Y, then we'll take this distance, this distance plus

182
00:15:05,660 --> 00:15:10,730
half of the height, which will add up to just this point right here.

183
00:15:10,730 --> 00:15:13,660
So we have X mean y, mean x, max, y max.

184
00:15:13,670 --> 00:15:19,250
Now be careful in the case where the X mean happens to be less than zero one to fix this to zero.

185
00:15:19,250 --> 00:15:24,140
If it's less than if the women is less than zero, we fix that to zero so we don't have negative values.

186
00:15:24,140 --> 00:15:29,540
If this is greater than the width, we fixed that to the true width.

187
00:15:29,590 --> 00:15:32,660
If the Y max is greater than the height, we fix that to the height.

188
00:15:32,660 --> 00:15:35,630
And so once we have this, now we opt in our final boxes.

189
00:15:35,630 --> 00:15:38,300
Does x mean y mean x, max, y max?

190
00:15:38,300 --> 00:15:45,590
And then not to forget the fact that we have some classes, so we are going to simply get the class

191
00:15:45,590 --> 00:15:48,020
with the highest probability score.

192
00:15:48,110 --> 00:15:53,990
So we just do this at Max and we make use of the selected output.

193
00:15:53,990 --> 00:15:59,180
Remember, the selected output is what we had already seen here.

194
00:15:59,900 --> 00:16:06,230
So based on this, we are going to take the last 20 values this year.

195
00:16:06,230 --> 00:16:12,020
We get the max, which happens to be this year, and then we get a class which corresponds to this position.

196
00:16:12,020 --> 00:16:14,570
So that is essentially what we do right here.

197
00:16:14,570 --> 00:16:19,820
We we have that position and then we, we have its corresponding class.

198
00:16:19,820 --> 00:16:25,490
We make sure that this a string and then we add that to our final box.

199
00:16:25,490 --> 00:16:27,260
So that's it for our final box.

200
00:16:27,260 --> 00:16:29,210
We also need our final scores.

201
00:16:29,210 --> 00:16:31,400
We understand why we need this final scores later.

202
00:16:31,400 --> 00:16:34,520
For now, just take the final scores.

203
00:16:34,520 --> 00:16:40,340
We make sure that we have again, you see, if JZ equals zero, then yeah, we have zero.

204
00:16:40,340 --> 00:16:47,210
So we have the selected output I and we pick zero mean that we pick in this.

205
00:16:47,750 --> 00:16:55,340
And then if J is equal, if JZ equals one, then year would have one times five, that's five.

206
00:16:55,340 --> 00:16:59,630
So that's a fit position which is going to be this probability score.

207
00:16:59,630 --> 00:17:02,660
So essentially we get in the probability scores for the two.

208
00:17:03,170 --> 00:17:04,160
Predictions.

209
00:17:04,280 --> 00:17:06,800
Remember, we had actually we actually have two predictions.

210
00:17:06,800 --> 00:17:13,970
So we get the probability scores and then we print them out.

211
00:17:13,970 --> 00:17:15,770
So we see what this looks like.

212
00:17:17,480 --> 00:17:18,450
There we go.

213
00:17:18,470 --> 00:17:22,000
As you could see, we have 0.965, which makes sense.

214
00:17:22,010 --> 00:17:25,730
This is 0.98.

215
00:17:26,570 --> 00:17:30,210
Here we have 0.965 and we have 0.985.

216
00:17:30,230 --> 00:17:32,930
Well, this is because there is some duplicates here.

217
00:17:32,930 --> 00:17:33,910
So that's it.

218
00:17:33,920 --> 00:17:35,650
Then we also see the final boxes.

219
00:17:35,660 --> 00:17:40,000
You see the class person, C person, person, person.

220
00:17:40,040 --> 00:17:43,940
Now we're going to see how to eliminate this duplicates shortly.

221
00:17:44,180 --> 00:17:45,740
And that's it.

222
00:17:45,740 --> 00:17:56,930
So for now, we have understood how we could get from this model's outputs to then be able to obtain

223
00:17:56,930 --> 00:18:00,440
this final boxes and the final scores.

224
00:18:01,370 --> 00:18:05,870
And now the next step will be to get into this non max operation.

225
00:18:05,870 --> 00:18:12,060
So we have maybe seen already the we looked at the normal suppression already in theory.

226
00:18:12,080 --> 00:18:15,620
Now we'll see that with TensorFlow, it's actually very easy to implement this.

227
00:18:15,620 --> 00:18:20,270
But before implementing, let's take a look at what is all about.

228
00:18:20,270 --> 00:18:25,940
Let's suppose we have an image like this and then we have this object.

229
00:18:26,270 --> 00:18:30,500
Let's say we have this object here and then we have some bounding box.

230
00:18:31,010 --> 00:18:32,690
We have this bounding box.

231
00:18:32,930 --> 00:18:35,870
Remember, for each cell we have two predictions.

232
00:18:35,870 --> 00:18:40,580
So let's suppose that our cell predicts this and that same cell predicts again another bounding box

233
00:18:40,580 --> 00:18:41,260
like this.

234
00:18:41,270 --> 00:18:43,610
All this for this same object.

235
00:18:43,640 --> 00:18:50,030
Now what we'll do is with a normal suppression algorithm, we are going to compare this to probabilities

236
00:18:50,030 --> 00:18:53,300
and say, okay, which one has the highest probability?

237
00:18:54,320 --> 00:18:57,100
If it turns out that it's this one with the highest probability.

238
00:18:57,110 --> 00:19:09,710
So let's say if this is 0.98, oops, let's say if this is 0.98, and then this one year is 0.96, and

239
00:19:09,710 --> 00:19:15,320
then this two are produced in this same object, then we are going to discard this box.

240
00:19:15,320 --> 00:19:19,790
So hence the term non max suppression.

241
00:19:19,790 --> 00:19:23,180
So we'll suppress this box and we'll be left only with this.

242
00:19:23,180 --> 00:19:27,840
So that's how we are going to also discard those duplications.

243
00:19:27,860 --> 00:19:35,180
Now going back to the implementation, all we need here is just this normal suppression method we have

244
00:19:35,180 --> 00:19:35,690
here.

245
00:19:35,810 --> 00:19:38,570
So we have this normal suppression from TensorFlow image.

246
00:19:38,570 --> 00:19:41,060
We specify the boxes.

247
00:19:41,060 --> 00:19:44,210
So we have this boxes here.

248
00:19:44,240 --> 00:19:48,560
Now know that our boxes from here included the classes, but we do not need that here.

249
00:19:48,560 --> 00:19:53,480
So we just, as you see, we pick the first four elements as essentially x means y min, x, max, y

250
00:19:53,510 --> 00:19:53,780
max.

251
00:19:53,780 --> 00:19:55,760
We pick this first four boxes.

252
00:19:55,760 --> 00:19:59,120
Then we also make sure we present in the scores.

253
00:19:59,120 --> 00:20:05,000
Remember, in the normal suppression algorithm, we need this class to be able to discard certain boxes

254
00:20:05,000 --> 00:20:12,440
which have or which are not the max scores, which we do not have the max scores and which are producing

255
00:20:13,280 --> 00:20:17,410
an object which has already been predicted by another box of higher score.

256
00:20:17,420 --> 00:20:19,610
So that's why we need to pass in the score here.

257
00:20:19,610 --> 00:20:23,840
So essentially we pass in the boxes, passing the scores.

258
00:20:23,840 --> 00:20:29,330
We want to specify the total, the maximum output size.

259
00:20:29,330 --> 00:20:30,800
Here we just pick 100.

260
00:20:30,800 --> 00:20:36,770
We we do not expect to have more than 100, but depending on your task, like you could have a task

261
00:20:36,770 --> 00:20:42,530
where you generally have maybe say, 150 objects to be detected at once.

262
00:20:42,530 --> 00:20:46,550
In that case, you need to increase this max output size to maybe say 1000.

263
00:20:46,580 --> 00:20:52,190
Now we have this IQ threshold right here to understand this concept of the IOU threshold.

264
00:20:52,190 --> 00:20:54,620
Let's take back our example we had here.

265
00:20:54,830 --> 00:21:03,560
If we have this here, if you have this example, in order for this algorithm to know that these two

266
00:21:03,560 --> 00:21:10,730
boxes are trying to predict the same object and we want to actually discard this one, what we'll make

267
00:21:10,730 --> 00:21:14,390
use of is this IOU threshold.

268
00:21:14,390 --> 00:21:17,270
So remember, we are seeing the IOU already.

269
00:21:17,270 --> 00:21:22,430
So if you have two boxes like this, these two boxes will compute the IOU score.

270
00:21:22,460 --> 00:21:27,890
That's essentially we'll look for the intersection between these two boxes, which is this area, and

271
00:21:27,890 --> 00:21:32,840
then divide it by this total area occupied by this two boxes.

272
00:21:32,840 --> 00:21:35,960
So in this case, it's all this area right here.

273
00:21:35,960 --> 00:21:38,330
So let's let's let's have it back.

274
00:21:38,330 --> 00:21:40,970
We have this here is the intersection.

275
00:21:40,970 --> 00:21:46,190
And then this year totally is the union.

276
00:21:46,490 --> 00:21:52,280
So we take that intersection divided by the union to obtain the IOU score.

277
00:21:52,310 --> 00:22:01,100
Now, if that IOU score is greater than the IOU threshold, like in this case, let's make let's specify

278
00:22:01,100 --> 00:22:01,510
an IOU.

279
00:22:01,520 --> 00:22:02,750
Treasury of 0.5.

280
00:22:03,030 --> 00:22:07,560
Is greater than 0.5, then we are going to discard this box.

281
00:22:07,560 --> 00:22:09,080
So we're going to discard this.

282
00:22:09,090 --> 00:22:13,330
If that ice core happens to be greater than 0.5.

283
00:22:13,350 --> 00:22:19,140
Now, if it is less than 0.5 minute, if we have a box like this, let's say we have a box like this

284
00:22:19,140 --> 00:22:22,020
where this area, this area here.

285
00:22:24,090 --> 00:22:26,400
Is divided by all.

286
00:22:26,400 --> 00:22:32,880
This area is less than 0.5, then we are not going to discard this box.

287
00:22:32,880 --> 00:22:39,500
So we consider that this box is trying to predict is for a different object and not this other object.

288
00:22:39,510 --> 00:22:45,810
So this this tree I trash whole year permits also determine whether two boxes are trying to predict

289
00:22:46,320 --> 00:22:48,350
the same object or not, essentially.

290
00:22:48,360 --> 00:22:49,320
So that's it.

291
00:22:50,190 --> 00:22:54,030
And then here we have this score threshold which is set to negative infinity.

292
00:22:54,570 --> 00:23:00,900
Now, the documentation is said that the score threshold actually is as already flow tensor, representing

293
00:23:00,900 --> 00:23:07,190
the threshold for deciding when to remove boxes based on score.

294
00:23:07,200 --> 00:23:14,580
So if you have a score threshold of zero points, for example, one, then what you are saying is all

295
00:23:14,580 --> 00:23:21,270
boxes which are which have a score of less than 0.1 are going to be discarded straight away, mindless

296
00:23:21,270 --> 00:23:29,100
of or regardless of whether they overlap with a box of higher score or not.

297
00:23:29,250 --> 00:23:30,240
So that's it.

298
00:23:30,240 --> 00:23:41,670
We get back here and then now we could print out let's print out our non max suppression output output.

299
00:23:41,670 --> 00:23:42,630
Let's run that.

300
00:23:43,440 --> 00:23:48,100
Now, one thing you notice here in this output is the fact that we have a single element.

301
00:23:48,120 --> 00:23:54,540
Now, what the single element actually means is between all these four options we had, that is here

302
00:23:54,540 --> 00:23:59,520
we have this person, we have this person, we have this person and this person.

303
00:23:59,520 --> 00:24:03,870
Only this one year at this position, one is going to be left.

304
00:24:03,870 --> 00:24:05,820
All the rest will be discarded.

305
00:24:05,820 --> 00:24:09,060
And to understand why they're discarded, you could look at this course.

306
00:24:09,060 --> 00:24:10,610
This is 0.96.

307
00:24:10,620 --> 00:24:12,420
This is 0.85.

308
00:24:12,450 --> 00:24:14,670
This is 0.96.

309
00:24:14,670 --> 00:24:18,050
This is 0.980.

310
00:24:18,090 --> 00:24:20,550
This is 0.985, not eight five.

311
00:24:20,550 --> 00:24:27,630
So what we're saying here is because this one has the highest probability and because it overlaps with

312
00:24:27,630 --> 00:24:33,180
the others like you see here, this one and this one will overlap because actually represent the same

313
00:24:33,180 --> 00:24:33,900
person.

314
00:24:33,900 --> 00:24:38,130
Then this others will be discarded.

315
00:24:38,550 --> 00:24:43,470
Now, in the case like this, where we have this exact same box with exact same probability, one is

316
00:24:43,470 --> 00:24:45,570
going to be left out and the other one left.

317
00:24:45,570 --> 00:24:47,970
So that's it.

318
00:24:47,970 --> 00:24:48,990
We have our output.

319
00:24:48,990 --> 00:24:53,100
Now we know that we only have a single box instead of all this four boxes.

320
00:24:53,100 --> 00:24:56,040
So we have our non max separation output.

321
00:24:56,040 --> 00:25:03,330
The next step will be to show visually what our predictions look like.

322
00:25:04,590 --> 00:25:10,320
Now here you'll note the fact that we are going to write this in this image.

323
00:25:10,320 --> 00:25:20,700
So we will draw the bounding boxes and put the text on our image only for AI in our normal expression

324
00:25:20,700 --> 00:25:21,150
output.

325
00:25:21,150 --> 00:25:24,150
So in this example, we're going to do that only once.

326
00:25:24,180 --> 00:25:29,430
Unlike the case where if we do not have non max pressure would have had to do that four times.

327
00:25:29,430 --> 00:25:34,450
So now we're doing this only once because after normal expression, we're left with only a single box.

328
00:25:34,470 --> 00:25:37,020
Now take a look at what we have here.

329
00:25:37,020 --> 00:25:39,900
We have the X mean, we have the x max.

330
00:25:39,900 --> 00:25:49,710
Remember from the final boxes here or we had year X mean y mean like you see we have the X mean we have

331
00:25:50,250 --> 00:25:56,730
the Y mean sorry, not the x max, we have the x max, we have the y max and then we have the color

332
00:25:56,730 --> 00:25:59,070
for the box and that's it.

333
00:25:59,070 --> 00:26:00,270
So that's it.

334
00:26:00,270 --> 00:26:05,490
We now put the text, now we put in this text based on certain position.

335
00:26:05,490 --> 00:26:12,110
So the text itself is going to contain the class from the final box.

336
00:26:12,120 --> 00:26:13,950
You see, we take the last element.

337
00:26:14,160 --> 00:26:16,170
Remember from year we had the class.

338
00:26:17,150 --> 00:26:23,600
And then once we obtain that class, we write that that's the text, we're going to put the text, and

339
00:26:23,600 --> 00:26:28,310
then the position of that text will be based on the X mean y, mean values.

340
00:26:28,310 --> 00:26:35,510
So you see we go, we go to X mean, but for Y mean we step 15 pixels downwards.

341
00:26:35,510 --> 00:26:36,770
So that's it.

342
00:26:36,770 --> 00:26:40,580
We define the font and the color and that's it.

343
00:26:40,580 --> 00:26:48,920
So we've put out this text and then now we're ready to write this out in our new image.

344
00:26:48,920 --> 00:26:53,000
So we create this new image and then we resize it.

345
00:26:53,630 --> 00:26:59,010
Obviously the content of this image is this year, this image which on which we have written the all

346
00:26:59,100 --> 00:27:04,400
or which we've drawn the rectangle, or that's the bounding boxes and the text.

347
00:27:04,400 --> 00:27:06,950
So let's run this now completely.

348
00:27:07,070 --> 00:27:10,760
And then you see again we have one and we open this up.

349
00:27:11,550 --> 00:27:15,330
Okay, So we should be able to have our output.

350
00:27:15,870 --> 00:27:17,110
And there we go.

351
00:27:17,130 --> 00:27:20,370
You see, we have a person notice from here.

352
00:27:22,050 --> 00:27:26,680
From here we had decided to go 15 steps.

353
00:27:26,700 --> 00:27:29,460
Oops, Let's take this off.

354
00:27:29,760 --> 00:27:32,160
We had decided to go 15 steps.

355
00:27:32,400 --> 00:27:33,390
Let's go downward.

356
00:27:34,170 --> 00:27:36,000
Here we just had to go 15 steps.

357
00:27:36,170 --> 00:27:36,330
I see.

358
00:27:36,330 --> 00:27:38,220
The text comes slightly down.

359
00:27:38,220 --> 00:27:47,490
So if you if you if you don't have this and then you do this, see, it goes up and it's not very visible.

360
00:27:47,490 --> 00:27:49,320
So let's have that back.

361
00:27:49,620 --> 00:27:51,230
You could obviously change the color.

362
00:27:51,240 --> 00:27:58,440
So let's say 225 and you could play around with all this different parameters.

363
00:27:58,440 --> 00:27:59,260
So that's it.

364
00:27:59,280 --> 00:28:02,710
Well, let's get back to the other color, because it's actually better.

365
00:28:02,730 --> 00:28:04,440
Let's say we want to have one.

366
00:28:05,340 --> 00:28:10,440
So now we could run this for all the different files and see what we get.

367
00:28:10,980 --> 00:28:15,900
Well, before even checking on that, let's suppose that we do not have this num suppression output.

368
00:28:15,900 --> 00:28:21,040
So let's leave out the normal suppression algorithm and see what our outputs will look like.

369
00:28:21,060 --> 00:28:24,930
Let's have your four I in range.

370
00:28:24,930 --> 00:28:27,240
The length of the final boxes.

371
00:28:27,240 --> 00:28:29,370
The final boxes here had a length of four.

372
00:28:29,370 --> 00:28:31,020
We had four outputs.

373
00:28:32,430 --> 00:28:34,180
Let's take this off.

374
00:28:34,200 --> 00:28:35,370
There we go.

375
00:28:35,820 --> 00:28:38,250
And then run this again and see what we have.

376
00:28:38,610 --> 00:28:40,490
So we have outputs.

377
00:28:40,500 --> 00:28:41,120
Oops.

378
00:28:41,130 --> 00:28:44,490
Well, we already had several different predictions.

379
00:28:45,300 --> 00:28:47,400
Like, okay, let's let's take this one for example.

380
00:28:47,400 --> 00:28:49,560
You see here we have this one.

381
00:28:50,820 --> 00:28:52,760
No, let's take let's let's not have that.

382
00:28:52,770 --> 00:28:54,670
Let's say we want to have year 40.

383
00:28:54,690 --> 00:28:58,410
Let's get back to 40 and 40.

384
00:28:59,130 --> 00:28:59,670
Okay.

385
00:28:59,670 --> 00:29:04,050
So one thing you can notice here is the fact that this you see we have these two predictions for this

386
00:29:04,050 --> 00:29:07,560
person and that's not what we want.

387
00:29:07,560 --> 00:29:14,940
So you see that the fact that we add this and normal expression here permits us to remove some of those

388
00:29:14,940 --> 00:29:15,690
boxes.

389
00:29:15,690 --> 00:29:24,060
So this threshold is what you play around with to ensure that you have a single box for a single object.

390
00:29:24,900 --> 00:29:30,660
So here you can see that this model does well in predicting the location of this train and knowing that

391
00:29:30,660 --> 00:29:32,880
it is actually a train we have.

392
00:29:32,880 --> 00:29:39,420
This here predicts that this is a person we have this predicts this person, but unfortunately doesn't

393
00:29:39,420 --> 00:29:39,840
get this.

394
00:29:39,840 --> 00:29:43,530
Other people here gets the airplane.

395
00:29:43,920 --> 00:29:45,090
This person.

396
00:29:45,090 --> 00:29:46,350
This person.

397
00:29:46,350 --> 00:29:50,400
Well, doesn't get this person does quite well here.

398
00:29:50,430 --> 00:29:52,770
See, this is the dining table.

399
00:29:53,700 --> 00:29:56,400
This person, this person and this person.

400
00:29:56,400 --> 00:29:57,270
So that's great.

401
00:29:58,050 --> 00:29:58,290
Yeah.

402
00:29:58,290 --> 00:30:01,120
We have the TV monitor, though.

403
00:30:01,120 --> 00:30:03,330
It doesn't get sort of monitors.

404
00:30:03,540 --> 00:30:04,020
Okay.

405
00:30:04,020 --> 00:30:06,150
So from here, we also have this bus.

406
00:30:06,150 --> 00:30:15,510
Unfortunately, we we it predicts to purchase a car also predicts two cars maybe due to this other car

407
00:30:15,540 --> 00:30:16,350
being here.

408
00:30:16,650 --> 00:30:22,560
Then we have this dining table and this person, we have this person and the dog.

409
00:30:22,560 --> 00:30:24,240
We have this person.

410
00:30:24,390 --> 00:30:25,320
This person.

411
00:30:25,320 --> 00:30:28,170
Well, still a dog here, but that's not right.

412
00:30:29,490 --> 00:30:30,180
Person.

413
00:30:30,180 --> 00:30:35,190
Person that doesn't see this other person sees these two people here.

414
00:30:36,840 --> 00:30:44,190
Your sister's persons, though the bottom box isn't quite well put out.

415
00:30:44,400 --> 00:30:55,080
Then we have the cat, we have this person, we have this car and then we have this person doesn't see

416
00:30:55,080 --> 00:30:57,420
the dog we have here.

417
00:30:57,420 --> 00:30:58,440
See these cows?

418
00:30:58,440 --> 00:31:01,200
Well, this this particular image was gotten from the paper.

419
00:31:01,200 --> 00:31:04,860
So just basically crop this from the paper to test it out.

420
00:31:04,860 --> 00:31:10,740
So it is a person but doesn't see that this to a dog so actually locates them quite well.

421
00:31:10,740 --> 00:31:12,360
So that's it.

422
00:31:12,360 --> 00:31:12,550
Yeah.

423
00:31:12,570 --> 00:31:21,300
It is a cat but doesn't see this dog sees a person, sees a cow, sees the person, person sees two

424
00:31:21,300 --> 00:31:26,910
people, sees this person and this person though the bounding boxes aren't quite well put out.

425
00:31:26,910 --> 00:31:27,570
Unfortunately.

426
00:31:27,570 --> 00:31:27,880
Yes.

427
00:31:27,880 --> 00:31:32,940
This car's not quite correct then.

428
00:31:32,940 --> 00:31:37,350
Yeah, we have motorbike and person, but this should be two motorbikes actually.

429
00:31:37,500 --> 00:31:38,760
Okay, so that's it.

430
00:31:38,760 --> 00:31:41,040
We've just tested out our model.

431
00:31:41,040 --> 00:31:49,350
We'll see how it does or how it works with our images on which it has never actually seen.
