1
00:00:03,010 --> 00:00:08,170
In this video tutorial we will look at Non-maximum suppression.

2
00:00:08,840 --> 00:00:16,850
Non-maximum suppression is a technique used to remove redundant or overlapping bounding boxes that may

3
00:00:16,850 --> 00:00:19,730
arise from object detection algorithm.

4
00:00:20,390 --> 00:00:23,960
For example, let's take this input image of car.

5
00:00:24,530 --> 00:00:31,670
So after doing object detection using yolo v9 algorithm, we get this output image.

6
00:00:31,670 --> 00:00:31,970
Like.

7
00:00:31,970 --> 00:00:34,580
You can see that we have a car in this image.

8
00:00:34,580 --> 00:00:42,170
And after doing object detection using yolo v9 algorithm, we see multiple bounding boxes are being

9
00:00:42,170 --> 00:00:44,660
drawn around this car.

10
00:00:44,720 --> 00:00:50,600
So like you can see that we have four different overlapping bounding boxes around car.

11
00:00:50,630 --> 00:00:54,140
But there should be one bounding box around car.

12
00:00:54,140 --> 00:00:58,610
So there should be a bounding box around car which has the highest confidence score.

13
00:00:58,610 --> 00:01:02,660
Not there should be overlapping bounding boxes around car.

14
00:01:02,990 --> 00:01:09,740
So to remove this redundant or overlapping bounding boxes, we use a Non-gm expression.

15
00:01:09,740 --> 00:01:16,700
So after applying non-mixed suppression, like you can see that we have only one bounding boxes on one

16
00:01:16,700 --> 00:01:23,690
bounding box around car, and all the overlapping bounding boxes or redundant bounding boxes have been

17
00:01:23,690 --> 00:01:24,110
revoked.

18
00:01:24,110 --> 00:01:29,180
And we have the bounding box around car which has the highest confidence score.

19
00:01:29,450 --> 00:01:37,730
So our apply using NMS or after applying NMS on the redundant bounding boxes or overlapping bounding

20
00:01:37,730 --> 00:01:43,460
boxes are removed that may that we may get from the object detection algorithm.

21
00:01:45,240 --> 00:01:50,940
So I have divided this complete tutorial into five different steps.

22
00:01:50,940 --> 00:01:52,170
I will be showing you.

23
00:01:52,170 --> 00:01:58,110
I will be showing showing you the Google Colab notebook where I will show you how you can implement

24
00:01:58,110 --> 00:01:59,400
Non-expression.

25
00:01:59,400 --> 00:02:05,580
And before we go towards the Google Colab notebook part, let's discuss the steps which we I will be

26
00:02:05,580 --> 00:02:10,410
implementing uh, in this tutorial to implement Non-maximum suppression.

27
00:02:12,000 --> 00:02:15,060
In the step number one, we will do object detection on image.

28
00:02:15,090 --> 00:02:19,410
So I will I, I have an input image of a person and a dog.

29
00:02:19,410 --> 00:02:22,800
So we will first do object detection on that image.

30
00:02:23,190 --> 00:02:29,550
So after doing object detection on the image we will get the bounding box coordinates for each object

31
00:02:29,550 --> 00:02:30,750
in the image.

32
00:02:31,910 --> 00:02:37,910
So we will have multiple bounding box coordinates for a single, uh, for each object in the image,

33
00:02:37,910 --> 00:02:40,910
like if we have person and dog in the image.

34
00:02:40,910 --> 00:02:44,840
So I will have the multiple bounding box coordinates for person.

35
00:02:44,840 --> 00:02:50,120
And I will also have multiple bounding box coordinate coordinates for dog as well.

36
00:02:50,630 --> 00:02:57,110
So in the step number two, after we get the bounding box coordinates for in the step number one.

37
00:02:57,790 --> 00:03:03,280
In the step number two, I will sort the bounding box coordinates based on the confidence score.

38
00:03:03,640 --> 00:03:04,330
So.

39
00:03:05,370 --> 00:03:10,500
After getting the bounding box coordinates, I will sort the bounding box coordinates based on the confidence

40
00:03:10,500 --> 00:03:10,860
score.

41
00:03:10,860 --> 00:03:17,610
So I will just create a list and the bounding box coordinates which have the highest confidence score

42
00:03:17,610 --> 00:03:23,610
will be at first in the list, and the bounding box coordinates, which have the least confidence score

43
00:03:23,610 --> 00:03:25,620
will be at last in the list.

44
00:03:25,620 --> 00:03:26,400
So.

45
00:03:26,910 --> 00:03:31,320
So the bounding box coordinates, which have the highest confidence score will be at the top of the

46
00:03:31,320 --> 00:03:35,580
list, and the bounding box coordinates, which have the least confidence score.

47
00:03:35,790 --> 00:03:39,780
I will be at the last of the list, so I will just create a list.

48
00:03:39,900 --> 00:03:45,450
Uh, considering the, uh, I will create a list of the bounding box coordinates for triggering the

49
00:03:45,450 --> 00:03:46,500
confidence score.

50
00:03:46,500 --> 00:03:51,300
And the the confidence score will be at, uh, in that descending order.

51
00:03:51,300 --> 00:03:57,150
So the, uh, as the higher bounding box coordinates with the with the higher confidence score will

52
00:03:57,150 --> 00:03:58,380
be at the start of the list.

53
00:03:58,380 --> 00:04:04,260
And the bounding box coordinates with the lowest confidence score will be at the last of the list.

54
00:04:05,990 --> 00:04:10,880
So after we get the book, after we have sorted the bounding box coordinates based on the confidence

55
00:04:10,880 --> 00:04:16,310
score in the step number three, I will start the from the bounding box coordinates with the highest

56
00:04:16,310 --> 00:04:17,180
confidence score.

57
00:04:17,180 --> 00:04:22,070
So I will just pick the bounding box coordinates which have the highest confidence score.

58
00:04:22,370 --> 00:04:22,910
Okay.

59
00:04:22,910 --> 00:04:29,780
And I will compare those bounding box coordinates with all the other bounding boxes of the same class.

60
00:04:29,780 --> 00:04:36,800
So for example, if I have a person in the image so I if I get multiple bounding box coordinates or

61
00:04:36,800 --> 00:04:38,600
overlapping bounding box coordinates.

62
00:04:38,990 --> 00:04:42,950
So I will just pick the bounding box coordinates which have the highest confidence score.

63
00:04:42,950 --> 00:04:48,890
And I will do compare those bounding box coordinates with the overlapping bounding box coordinates for

64
00:04:48,890 --> 00:04:50,060
the person class.

65
00:04:50,480 --> 00:04:56,660
Okay, so if we compare the bounding box coordinates for the person class which have the highest confidence

66
00:04:56,660 --> 00:05:00,860
score with the overlapping bounding box coordinates of the same person class.

67
00:05:02,040 --> 00:05:07,020
So we will do the comparison based on the intersection over union.

68
00:05:07,320 --> 00:05:15,330
So to to calculate a metric or to to do the comparison we will be using intersection over union approach.

69
00:05:15,330 --> 00:05:21,360
So if the intersection over union of the two bounding boxes which we are comparing over here.

70
00:05:21,360 --> 00:05:27,480
So if the intersection over union of the two bounding boxes over the overlap ratio is above a certain

71
00:05:27,480 --> 00:05:28,020
threshold.

72
00:05:28,080 --> 00:05:33,540
So for example, if I have to find a IOU threshold of 0.5 okay.

73
00:05:33,540 --> 00:05:36,810
So I have to find the IOU threshold of 0.5.

74
00:05:37,200 --> 00:05:42,840
And if I am doing the comparison of the bounding box coordinates with the highest confidence score,

75
00:05:43,350 --> 00:05:48,150
and with the other bounding box coordinates which have less confidence score.

76
00:05:48,240 --> 00:05:56,670
So while doing comparison, if I get IOU value as 0.6 okay, so for example a while doing comparison,

77
00:05:56,670 --> 00:06:03,720
I get intersection over union value as 0.6 and I have to find the IOU threshold as 0.5.

78
00:06:03,900 --> 00:06:11,220
So I am getting IOU score of 0.6 and I have to find the IOU, uh, threshold as 0.5.

79
00:06:11,220 --> 00:06:15,450
So I am getting a I score above the IOU threshold.

80
00:06:15,450 --> 00:06:19,680
So the bounding box with the lower confidence score will be removed.

81
00:06:19,680 --> 00:06:26,550
So if we get the IOU value intersection over union value of the two bounding boxes above the threshold,

82
00:06:26,550 --> 00:06:30,390
then the bounding box with the low confidence score will be removed.

83
00:06:30,930 --> 00:06:36,300
And we will repeat this process until all the bounding box gets examined.

84
00:06:37,230 --> 00:06:40,680
So let me show you how intersection over union works.

85
00:06:41,070 --> 00:06:46,860
So intersection over union tells us the percentage of overlap between the ground truth bounding box

86
00:06:46,860 --> 00:06:49,440
and the predicting prediction bounding box.

87
00:06:49,440 --> 00:06:55,500
But in the case of non-max suppression, the definition of intersection over union changes a bit.

88
00:06:55,500 --> 00:07:02,040
So in case of non-maximum suppression, we find intersection over union between two prediction bounding

89
00:07:02,040 --> 00:07:02,820
boxes.

90
00:07:02,820 --> 00:07:05,550
For example, let's take this image of a bird.

91
00:07:05,550 --> 00:07:07,350
You can see the bird over here.

92
00:07:07,350 --> 00:07:13,890
So like you can see we have two bounding boxes like the this bounding box within the red color has a

93
00:07:13,890 --> 00:07:21,960
confidence score of 0.8, and the bounding box in the blue color as a confidence score of 0.6.

94
00:07:21,990 --> 00:07:22,410
Okay.

95
00:07:22,410 --> 00:07:23,430
So the.

96
00:07:25,290 --> 00:07:30,780
Term bounding box in the red color, has a confidence score of 0.8, and the bounding box in the blue

97
00:07:30,780 --> 00:07:33,480
color has a confidence score of 0.6.

98
00:07:33,480 --> 00:07:38,880
So we calculate the IOU intersection over union for these.

99
00:07:40,060 --> 00:07:41,110
Of two bounding boxes.

100
00:07:41,110 --> 00:07:45,820
Okay, so I'm just calculating the intersection over union for these two bounding boxes.

101
00:07:45,820 --> 00:07:50,260
And I get the intersection over union values 0.96.

102
00:07:50,260 --> 00:07:54,190
And here I have defined the IOU threshold as 0.5.

103
00:07:54,190 --> 00:07:59,680
So this IOU value intersection over union value which I am getting is 0.96.

104
00:07:59,680 --> 00:08:04,780
So this value is greater than the IOU threshold which I have defined over here.

105
00:08:04,780 --> 00:08:05,950
So what will happen.

106
00:08:05,950 --> 00:08:08,950
So the bounding box with the low confidence board.

107
00:08:08,950 --> 00:08:13,720
Like you can see that uh, the blue bounding box has a confidence score of 0.6.

108
00:08:13,720 --> 00:08:17,950
So the blue bounding box which has a low confidence score will be removed.

109
00:08:17,950 --> 00:08:21,730
And this red color bounding box will be retained.

110
00:08:21,730 --> 00:08:22,330
Okay.

111
00:08:22,570 --> 00:08:24,190
So let's take another example.

112
00:08:24,190 --> 00:08:27,250
So you can see we have uh the bird over here as well.

113
00:08:27,250 --> 00:08:32,800
And we have the uh bond bounding box in red color which have the confidence score of 0.8.

114
00:08:32,800 --> 00:08:38,830
And over here the we have the bounding box in blue color which have a confidence score of 0.2.

115
00:08:39,220 --> 00:08:39,520
Okay.

116
00:08:39,520 --> 00:08:46,480
And I am just getting an IOU threshold of 0.2 or I or I can say I'm getting intersection over union

117
00:08:46,480 --> 00:08:48,580
value as 0.22.

118
00:08:48,760 --> 00:08:49,480
So.

119
00:08:50,800 --> 00:08:54,700
Like you can see over here we are getting intersection over union values 0.22.

120
00:08:54,730 --> 00:09:00,280
So we do a comparison of red color bounding box with the blue color bounding box using intersection

121
00:09:00,280 --> 00:09:01,090
over union.

122
00:09:01,090 --> 00:09:05,500
And I get the intersection over union value as 0.22.

123
00:09:05,500 --> 00:09:09,760
And my IOU threshold is being defined as 0.5.

124
00:09:10,060 --> 00:09:18,430
So in this case, like you can see that my intersection over union values 0.22 is less than the threshold.

125
00:09:18,430 --> 00:09:21,790
So this bounding box will not be removed.

126
00:09:21,820 --> 00:09:25,810
So over here you can see that we have IOU threshold of 0.5.

127
00:09:25,810 --> 00:09:29,740
And my intersection over union value is higher than the threshold.

128
00:09:29,740 --> 00:09:33,310
So the bounding box with the low confidence score will be removed.

129
00:09:33,310 --> 00:09:37,780
So the blue color bounding box has a low confidence score than that red color bounding box.

130
00:09:37,780 --> 00:09:40,150
So the blue color bounding box will be removed.

131
00:09:40,510 --> 00:09:47,500
In this case, while doing the comparison of these two bounding boxes, I am getting IOU as 0.22 while

132
00:09:47,500 --> 00:09:50,770
my threshold is being defined as 0.5.

133
00:09:50,770 --> 00:09:58,750
So I am getting a low IOU than the IOU threshold, so this bounding boxes will be retained and no bounding

134
00:09:58,750 --> 00:10:00,790
box will be removed over here.

135
00:10:02,410 --> 00:10:05,680
So this is how we can create the intersection over union.

136
00:10:05,680 --> 00:10:07,870
Or you can say that intersection over union.

137
00:10:07,870 --> 00:10:12,820
It can be represented in mathematical form as uh, the union's intersection size.

138
00:10:12,820 --> 00:10:14,590
Like you can see we have two bounding boxes.

139
00:10:14,590 --> 00:10:15,550
This is the box one.

140
00:10:15,550 --> 00:10:16,720
This is the box two.

141
00:10:16,720 --> 00:10:26,290
So intersection size of box one and box two and divided by the union size of box one and box two.

142
00:10:26,560 --> 00:10:33,010
So this is how we get grid intersection over union in mathematical form or intersection over union in

143
00:10:33,010 --> 00:10:40,900
mathematical form can be represented as uh, intersection over union or intersection size of bounding

144
00:10:40,900 --> 00:10:48,160
box one and bounding box two, and the union size of the bounding box one and the bounding box two.

145
00:10:48,580 --> 00:10:49,180
Okay.

146
00:10:49,180 --> 00:10:55,120
Or you can say that we can calculate intersection over union by dividing intersection area.

147
00:10:55,880 --> 00:10:57,170
Over the Union area.

148
00:10:57,170 --> 00:11:00,890
So we divide intersection area over the Union area.

149
00:11:00,890 --> 00:11:09,200
So to calculate IOU, uh we can divide intersection area over the union area okay.

150
00:11:09,200 --> 00:11:13,610
So here we have the intersection size of bounding box one and the bounding box two.

151
00:11:13,610 --> 00:11:18,740
And here we have the union size of the bounding box one and the bounding box two.

152
00:11:18,740 --> 00:11:25,850
So to calculate intersection over union you can divide intersection area divided by the union area.

153
00:11:28,040 --> 00:11:30,830
So that's you can see that we have two bounding boxes.

154
00:11:30,950 --> 00:11:32,840
Uh like you can see this is the bounding box one.

155
00:11:32,840 --> 00:11:34,400
And this is the bounding box two.

156
00:11:34,430 --> 00:11:36,110
Like you can see over here.

157
00:11:36,470 --> 00:11:41,060
So IOU you can see IOU is equal to 0.0 okay.

158
00:11:41,060 --> 00:11:47,600
So like you can see that uh, if the two bounding boxes are very far away from each other.

159
00:11:47,600 --> 00:11:54,740
So you can see that IOU is 0.0, but you can see over here if the two bounding boxes intersect with

160
00:11:54,740 --> 00:11:58,160
each other, we have IOU as 0.08.

161
00:11:58,160 --> 00:12:04,310
And if the two bounding boxes, uh, intersect a bit more with each other than the IOU values increases.

162
00:12:04,310 --> 00:12:10,790
If the bounding box two bounding boxes intersect, uh, more closely with each other than the IOU value

163
00:12:10,820 --> 00:12:12,350
rises to 0.43.

164
00:12:12,350 --> 00:12:19,820
And if the two bounding boxes completely overlap with each other, then we have a perfect score of 1.0.

165
00:12:21,570 --> 00:12:27,330
So this is the some conclusions which we get from what we have understand from till now.

166
00:12:27,360 --> 00:12:33,690
So if we stack the select the R2 threshold, uh, value higher, like I have said, the higher threshold

167
00:12:33,690 --> 00:12:35,010
value to 0.5.

168
00:12:35,010 --> 00:12:40,560
And if you set the threshold value as 0.6 or 0.7, so what will happen?

169
00:12:41,280 --> 00:12:47,760
This will result in fewer bounding boxes being suppressed or removed, which may lead to more overlapping

170
00:12:47,760 --> 00:12:49,500
bounding boxes being retained.

171
00:12:49,650 --> 00:12:56,820
So now if you just see over here, I have set the threshold of 0.5 and I am getting the IOU score of

172
00:12:56,820 --> 00:12:58,200
0.22.

173
00:12:58,230 --> 00:13:02,730
So as my IOU value is less than the threshold.

174
00:13:02,730 --> 00:13:03,660
So.

175
00:13:04,840 --> 00:13:07,810
This morning box with low confidence score is not removed.

176
00:13:07,810 --> 00:13:11,770
So if I have set the threshold is equal to 0.1.

177
00:13:11,770 --> 00:13:19,810
So if I have set this value of I with threshold as 0.1, so my IOU will be greater than this original

178
00:13:19,810 --> 00:13:25,540
threshold, then I will be removing this bounding box with low confidence score okay.

179
00:13:26,440 --> 00:13:34,060
So if we set the higher R threshold value, this will result in fewer bounding box will be removed and

180
00:13:34,060 --> 00:13:38,290
this will result in more overlapping bounding box being retained okay.

181
00:13:38,620 --> 00:13:46,570
Similarly, if I set a lower track threshold, uh, this will uh be more aggressive in removing overlapping

182
00:13:46,570 --> 00:13:47,230
bounding boxes.

183
00:13:47,230 --> 00:13:51,940
So if I set the threshold a low value of higher threshold like 0.1.

184
00:13:51,940 --> 00:13:57,130
So this will result in removing of all the overlapping bounding boxes.

185
00:13:57,130 --> 00:14:04,360
Or you can say that our model will be more aggressive in removing the overlapping bounding boxes, which

186
00:14:04,360 --> 00:14:09,370
may lead to fewer bounding boxes but more accurate object detections.

187
00:14:09,370 --> 00:14:13,930
So this is the conclusion which we draw from what we have understand till now.

188
00:14:13,930 --> 00:14:20,020
So let's move towards the Colab notebook where I will be showing how we can implement non-maximum suppression.

189
00:14:23,450 --> 00:14:26,570
So here is the Google Colab notebook that I have prepared.

190
00:14:26,570 --> 00:14:32,750
So like you can see that here we have a image that in the image we have a person and a dog.

191
00:14:32,750 --> 00:14:39,350
So after doing object detection for example using yolo v9 algorithm, we get multiple bounding boxes

192
00:14:39,350 --> 00:14:44,270
around person as well as we get multiple bounding boxes around that dog as well.

193
00:14:44,420 --> 00:14:47,300
So then we apply non-maximum suppression.

194
00:14:47,300 --> 00:14:54,170
And after applying non-maximum suppression, you can see over here we get only a single bounding box

195
00:14:54,170 --> 00:14:59,150
around dog and only one bounding box around person as well.

196
00:14:59,180 --> 00:15:03,170
Okay, we are using non-maximum suppression.

197
00:15:03,170 --> 00:15:09,770
We have removed all the redundant and or overlapping bounding boxes from the dog as well as from the

198
00:15:09,770 --> 00:15:10,430
person.

199
00:15:11,000 --> 00:15:11,450
Both.

200
00:15:11,840 --> 00:15:13,790
So now let's get started with it.

201
00:15:13,790 --> 00:15:20,780
So in the step number one I will be importing all this required libraries uh, in cv2 so that uh I'm

202
00:15:20,780 --> 00:15:25,070
using OpenCV Python library so that I can do pre-processing on the image.

203
00:15:25,070 --> 00:15:31,760
Then we have uh, Matplot library so that I can show you the input or output image in the Colab notebook.

204
00:15:31,760 --> 00:15:37,580
And similarly we also require image library to show you the input or output image in the Google Colab

205
00:15:38,090 --> 00:15:38,870
notebook.

206
00:15:40,190 --> 00:15:40,760
Okay.

207
00:15:43,450 --> 00:15:47,590
So I am just running this stand so that I can import all these required libraries.

208
00:15:47,590 --> 00:15:53,920
So I have just placed this like you can see this image or this input image in my Google Drive.

209
00:15:53,920 --> 00:15:59,830
So I'm directly downloading this input image from my Google Drive directly into this Google Colab notebook.

210
00:16:00,070 --> 00:16:00,910
So.

211
00:16:03,120 --> 00:16:09,810
If I just run this cell here, pass the Google Drive link of this input image and like you can see over

212
00:16:09,810 --> 00:16:11,970
here, now we have the input image over here.

213
00:16:11,970 --> 00:16:18,120
I have downloaded this from my Google drive and using Image library uh image library which we have imported.

214
00:16:18,120 --> 00:16:20,700
I will show you the input image over here.

215
00:16:20,700 --> 00:16:24,750
You just need to copy the path from here and add this path over here.

216
00:16:24,750 --> 00:16:31,110
And you just run this cell, uh, you will be able to show I see the input image in the Google Colab

217
00:16:31,110 --> 00:16:31,530
notebook.

218
00:16:31,530 --> 00:16:34,650
So you can see now this is our input image okay.

219
00:16:36,020 --> 00:16:37,370
So now, uh.

220
00:16:38,090 --> 00:16:44,270
I will be downloading the bounding box coordinates, confidence score and class name for the person

221
00:16:44,510 --> 00:16:47,000
as well as for the dog in this image.

222
00:16:47,000 --> 00:16:47,420
Okay.

223
00:16:47,660 --> 00:16:55,070
So I have placed the dot txt file on my Google drive which contains the bounding box coordinates, confidence

224
00:16:55,070 --> 00:16:58,820
score, uh, and class name for each of this object.

225
00:16:58,820 --> 00:17:01,250
Like we have two objects like person and a dog.

226
00:17:01,250 --> 00:17:08,030
So I will be downloading a txt dot txt file which will contain the bounding box coordinates, confidence

227
00:17:08,030 --> 00:17:10,490
score and class name for each object in the image.

228
00:17:10,910 --> 00:17:12,440
Okay, so.

229
00:17:14,010 --> 00:17:19,440
So as I showed you in the image we will have, we will have multiple bounding boxes for each object

230
00:17:19,440 --> 00:17:20,280
in this image.

231
00:17:20,280 --> 00:17:20,580
Okay.

232
00:17:20,580 --> 00:17:24,870
So I have now downloaded the predictions dot txt file which you can see over here.

233
00:17:25,200 --> 00:17:30,270
Okay so now I will be reading the predictions dot txt file line by line.

234
00:17:30,270 --> 00:17:33,780
So for this I have created this function create dash predictions.

235
00:17:35,730 --> 00:17:39,870
So we will read, uh, that data from the dot txt file line by line.

236
00:17:39,870 --> 00:17:42,000
And here we have the results.

237
00:17:42,000 --> 00:17:47,940
So now you can see that the first four represents the bounding box coordinates.

238
00:17:47,940 --> 00:17:49,290
This is the class name.

239
00:17:49,290 --> 00:17:53,970
The fifth uh or an item represents the class name.

240
00:17:53,970 --> 00:17:56,310
And the last one represents the confidence score.

241
00:17:56,310 --> 00:18:01,800
So the first four are the bounding box coordinates, the fifth represents the class name.

242
00:18:01,800 --> 00:18:05,070
And the last one which is the sixth is the confidence score.

243
00:18:05,100 --> 00:18:09,360
Like you can see that we have two objects in the image, uh, person and a dog.

244
00:18:09,360 --> 00:18:14,850
But you can see that we are getting multiple bounding box coordinates for each object in this image.

245
00:18:14,850 --> 00:18:15,390
Okay.

246
00:18:15,390 --> 00:18:19,500
So now you can see that, uh, this requires some pre-processing as well.

247
00:18:20,840 --> 00:18:26,120
So now here I am just doing pre-processing and just converting this bounding box coordinates, last

248
00:18:26,120 --> 00:18:32,300
name and confidence score in the required format that we require so that we can get rid, uh, we can

249
00:18:32,300 --> 00:18:39,890
draw the bounding boxes around these objects, uh, using OpenCV, Python, and uh, we can add assign

250
00:18:39,890 --> 00:18:45,320
colors to each of the bounding boxes so that we just need to, uh, convert this bounding box coordinates

251
00:18:45,320 --> 00:18:47,210
into a proper format.

252
00:18:47,390 --> 00:18:47,960
Okay.

253
00:18:50,160 --> 00:18:55,620
So I've just created a function by the name process prediction, so that I can just do the complete

254
00:18:55,620 --> 00:19:00,090
pre-processing and convert this bounding box coordinates into a proper required format.

255
00:19:00,090 --> 00:19:05,670
So now you can see over here, uh, I've just converted this bounding box coordinates confidence score

256
00:19:05,670 --> 00:19:08,640
and class name into this proper format.

257
00:19:08,640 --> 00:19:12,420
Like you can see that, uh, this is the first sub list which I have.

258
00:19:12,630 --> 00:19:15,570
Uh, the first four represents the bounding box coordinates.

259
00:19:15,570 --> 00:19:18,090
This is the class name and this is the confidence port.

260
00:19:18,090 --> 00:19:20,070
And we have two objects in the image.

261
00:19:20,070 --> 00:19:23,640
And we are getting multiple bounding box coordinates for each object in the image.

262
00:19:25,470 --> 00:19:27,540
So I'm using OpenCV, Python.

263
00:19:27,540 --> 00:19:33,600
I'm just reading this input image now, and just converting this image from BGR to RGB so that I can

264
00:19:33,600 --> 00:19:35,730
show you this image using matplotlib.

265
00:19:35,730 --> 00:19:40,560
So to display an image using matplotlib, we need to convert the image from BGR to RGB.

266
00:19:40,560 --> 00:19:46,050
So when OpenCV Python reads an image, it reads in the form of BGR blue, green, red.

267
00:19:46,050 --> 00:19:51,300
And to display any image using matplotlib, we need to convert the image into RGB.

268
00:19:51,330 --> 00:19:59,220
Okay, so OpenCV reads in the form of BGR and to display using a matplotlib, we need to convert it

269
00:19:59,220 --> 00:20:00,150
into RGB.

270
00:20:00,150 --> 00:20:03,990
So now you can see this uh is the input image we have.

271
00:20:03,990 --> 00:20:06,150
So now here is our color map.

272
00:20:06,150 --> 00:20:07,320
So now.

273
00:20:08,670 --> 00:20:09,240
Over here.

274
00:20:09,240 --> 00:20:10,320
Like you can see.

275
00:20:10,920 --> 00:20:15,570
Uh, I'm just drawing the bounding boxes around each of the objects in the image.

276
00:20:15,570 --> 00:20:20,820
So zero represents the person class and one represents the dog class.

277
00:20:20,820 --> 00:20:23,640
So for a dog class, you can have an orange color.

278
00:20:23,640 --> 00:20:27,390
And for the person class you can have a yellow color.

279
00:20:27,390 --> 00:20:27,900
Okay.

280
00:20:27,900 --> 00:20:30,810
So I'm just drawing this, uh, color over here.

281
00:20:37,410 --> 00:20:41,070
Okay, so now you can see that I have, uh.

282
00:20:41,670 --> 00:20:41,970
Uh.

283
00:20:42,800 --> 00:20:45,440
Drawn these bounding box partners over here.

284
00:20:45,440 --> 00:20:52,520
Like, you can see that, um, we have, uh, round two bounding boxes around dog, and we have around

285
00:20:52,520 --> 00:20:55,040
three different bounding boxes around the person.

286
00:20:55,040 --> 00:20:58,640
So this is the output which you get from the object detection algorithm.

287
00:20:58,640 --> 00:21:03,410
Like you can see that we have multiple bounding boxes, uh, around dog as well as we have multiple

288
00:21:03,410 --> 00:21:05,630
bounding boxes around the person as well.

289
00:21:05,630 --> 00:21:06,110
Okay.

290
00:21:06,110 --> 00:21:09,440
So now we need to calculate norm expression.

291
00:21:09,470 --> 00:21:14,750
To calculate norm expression uh or we need to have IOU as well.

292
00:21:14,750 --> 00:21:19,100
Like we need to uh uh have a function for the intersection over union as well.

293
00:21:19,100 --> 00:21:21,830
So why we are implementing norm expression.

294
00:21:21,830 --> 00:21:27,590
The reason for implementing norm expression is that and that we need to remove all the redundant or

295
00:21:27,590 --> 00:21:28,820
overlapping bounding boxes.

296
00:21:28,820 --> 00:21:32,210
And we need to have only one bounding box for a single object.

297
00:21:32,210 --> 00:21:37,970
And that bounding box will be the bounding box, which has the highest confidence score among the overlapping

298
00:21:37,970 --> 00:21:38,960
bounding boxes.

299
00:21:40,310 --> 00:21:45,260
So now I will just first create a function to calculate intersection over union.

300
00:21:45,260 --> 00:21:49,520
So like you can see over here if we have two bounding boxes this is the box one.

301
00:21:49,520 --> 00:21:51,740
And this is the box two okay.

302
00:21:51,740 --> 00:21:58,310
So we will be calculating uh intersection over union uh for two bounding boxes like this is a box one

303
00:21:58,310 --> 00:21:59,270
and this is a box two.

304
00:21:59,270 --> 00:22:00,920
And these are the bounding box coordinates.

305
00:22:00,920 --> 00:22:07,040
So, uh, on the top left corner we have the, uh, x one, y one coordinates.

306
00:22:07,040 --> 00:22:10,430
And in the bottom right corner we have the X2Y2 coordinates.

307
00:22:10,430 --> 00:22:14,210
So here you can see we are just calculating intersection over union.

308
00:22:14,210 --> 00:22:16,280
And here you can see that.

309
00:22:17,710 --> 00:22:20,230
What each of the bombing groups were like.

310
00:22:20,230 --> 00:22:24,760
You can see over here we are separating the x1, y1, x2, y2.

311
00:22:24,760 --> 00:22:29,650
Like you can see that for the box one this is the x1 y1 and this is the x2 y2.

312
00:22:29,680 --> 00:22:31,150
Value for the box two.

313
00:22:31,150 --> 00:22:34,390
This is the x1 y1 and this is the x2 y2 value.

314
00:22:34,390 --> 00:22:37,330
So you can see we have just all these values over here.

315
00:22:37,330 --> 00:22:43,510
Then we just uh, as I told you, that intersection over union formula is, uh, intersection area divided

316
00:22:43,510 --> 00:22:44,680
by the Union area.

317
00:22:44,680 --> 00:22:49,390
So I'll you Vonda is intersection area divided by the Union area.

318
00:22:49,390 --> 00:22:56,380
So here we are just calculating the area of intersection by uh by multiplying width with the height.

319
00:22:56,380 --> 00:23:03,370
So calculate the area of intersection with uh multiply width by by y the height of the bounding box.

320
00:23:03,370 --> 00:23:04,930
Then here we have the.

321
00:23:06,120 --> 00:23:07,170
Area of union.

322
00:23:07,170 --> 00:23:15,090
So to calculate the area of union, we calculate the box area or we add up the box area with the box

323
00:23:15,090 --> 00:23:16,680
one area with the box two area.

324
00:23:16,680 --> 00:23:20,070
And we subtract the area of intersection which is this okay.

325
00:23:20,340 --> 00:23:27,360
So in the end we divide to calculate area we divide the area of intersection by the area of Union okay.

326
00:23:29,480 --> 00:23:33,200
So now after I've just created a function, I intersection over union.

327
00:23:33,200 --> 00:23:37,940
Then we will start calculating Non-maximum suppression in the non-maximum suppression.

328
00:23:37,970 --> 00:23:40,970
Okay, so let me just run this cell as well.

329
00:23:41,570 --> 00:23:42,020
Okay.

330
00:23:45,350 --> 00:23:45,620
Yep.

331
00:23:45,890 --> 00:23:47,870
So I'll just run this cell as well.

332
00:23:47,870 --> 00:23:51,020
So now this is our main function which is non-maximum suppression.

333
00:23:51,020 --> 00:23:54,350
So here I have this pass the confidence threshold.

334
00:23:54,350 --> 00:24:01,160
So we will accept all the bounding box partners which have a confidence or score above 0.10.

335
00:24:01,160 --> 00:24:06,800
So here I have just set the confidence threshold limit as 0.10, which means I will accept all the bounding

336
00:24:06,800 --> 00:24:10,850
boxes which have a confidence score above 0.10.

337
00:24:11,120 --> 00:24:15,830
Similarly, I have also defined the threshold of 0.10.

338
00:24:15,830 --> 00:24:16,760
So.

339
00:24:18,190 --> 00:24:21,310
Uh, if I when we get grade intersection over union.

340
00:24:21,310 --> 00:24:28,210
If I lose you, uh, if our IOU is above the IOU threshold, then we will remove the bounding box,

341
00:24:28,210 --> 00:24:31,210
which has the low confidence for, as I told in the slide.

342
00:24:31,210 --> 00:24:35,110
So I have set the threshold value as 0.10.

343
00:24:35,110 --> 00:24:38,290
So here you can see that I am just getting IOU.

344
00:24:38,290 --> 00:24:44,560
So if my IOU score is above the IOU threshold, like you can see if my IOU score is greater than the

345
00:24:44,560 --> 00:24:49,690
I o threshold which I have defined 0.10, then then bounding box will be removed.

346
00:24:50,140 --> 00:24:51,160
Um, okay.

347
00:24:52,310 --> 00:24:56,180
So the bounding box which which has the low confidence score will be removed.

348
00:24:56,330 --> 00:25:01,190
So in the first step, uh, here you can see I've just created a function non-maximum suppression.

349
00:25:01,190 --> 00:25:03,320
In the first step we are doing sorting.

350
00:25:03,320 --> 00:25:08,690
So we are just sorting out the bounding box coordinates, which, uh, are considering their, uh,

351
00:25:08,690 --> 00:25:09,620
confidence score.

352
00:25:09,620 --> 00:25:11,270
So I am just creating a list.

353
00:25:11,270 --> 00:25:16,640
And the bounding box which has the highest confidence score will be at the start of the list.

354
00:25:16,640 --> 00:25:22,070
And the bounding boxes, which have the lower confidence score will be at the last of the list.

355
00:25:22,070 --> 00:25:28,910
So I am just creating a list and I am sorting, uh, the list based on the confidence score.

356
00:25:28,910 --> 00:25:34,370
So I am just creating a list, uh, which contains the bounding box score, uh, which contains the

357
00:25:34,370 --> 00:25:36,830
bounding box coordinates for each detection.

358
00:25:38,260 --> 00:25:40,000
Like you can see this input image.

359
00:25:40,000 --> 00:25:46,000
So you can see we have uh two bounding boxes for dog and we have three dog bounding boxes for person.

360
00:25:46,000 --> 00:25:50,110
So I will be just creating a list which contains the bounding box coordinates.

361
00:25:50,110 --> 00:25:55,330
And the bounding box coordinates will be sorted like based on their confidence score.

362
00:25:55,330 --> 00:25:58,720
So the bounding box coordinates, which has the, uh.

363
00:26:00,330 --> 00:26:04,560
I is confident squad will be at the start of the list, and the bounding box coordinates which have

364
00:26:04,560 --> 00:26:09,870
the low confidence score are which are the least confidence score will be at the last of the list.

365
00:26:09,870 --> 00:26:16,140
So I will be just uh uh, creating a list where I will be just, uh, placing the bounding box coordinates

366
00:26:16,140 --> 00:26:20,220
so you can see that here I have defined empty list by the name filter dash boxes.

367
00:26:20,220 --> 00:26:21,090
So okay.

368
00:26:21,090 --> 00:26:24,000
So here you can see that I am performing the sorting.

369
00:26:24,000 --> 00:26:27,420
So in the fifth, like I told you that, uh.

370
00:26:29,130 --> 00:26:35,460
The first four indexes in the list, uh represents the uh bounding box coordinates, the fifth represents

371
00:26:35,460 --> 00:26:40,320
the uh class name and the last represents the confidence score.

372
00:26:40,320 --> 00:26:44,280
Okay, so if I just get rid, uh, considering the python.

373
00:26:44,280 --> 00:26:46,830
So this is 012345.

374
00:26:46,830 --> 00:26:48,300
So this is the fifth index okay.

375
00:26:50,560 --> 00:26:56,950
So like, you can see that in the last, uh, if our, uh, confidence score is greater than the confidence

376
00:26:56,950 --> 00:27:02,410
threshold, like you can see over here, then the, uh, which we have defined 0.10, then you will

377
00:27:02,410 --> 00:27:08,050
be up, uh, appending the bounding box coordinates into this, uh, filter boxes list.

378
00:27:08,140 --> 00:27:10,480
This is the empty list which I have created over here.

379
00:27:10,480 --> 00:27:16,330
And I will be adding the bounding box coordinates for each of the bounding box into this list, and

380
00:27:16,330 --> 00:27:20,020
the bounding box coordinates will be added based on their confidence score.

381
00:27:20,020 --> 00:27:24,700
So the bounding box coordinates which have the highest confidence score will, uh, will be at the start

382
00:27:24,700 --> 00:27:25,180
of the list.

383
00:27:25,180 --> 00:27:30,760
And the bounding box coordinates, which have the lowest confidence score will be at the last of the

384
00:27:30,760 --> 00:27:31,090
list.

385
00:27:31,090 --> 00:27:34,330
Okay, so I'm just, uh, doing the count over here.

386
00:27:34,330 --> 00:27:35,710
You can skip this as well.

387
00:27:35,710 --> 00:27:42,850
So if we have if the length of the, uh, this list is greater than zero, then we will just take dot

388
00:27:42,850 --> 00:27:43,480
pop zero.

389
00:27:43,480 --> 00:27:45,790
So we will just pick the first index.

390
00:27:45,970 --> 00:27:50,140
Like you can say that we will first the first sub list from the list.

391
00:27:50,140 --> 00:27:54,640
So we will be just picking up, uh, the first sub list from the list okay.

392
00:27:54,640 --> 00:28:01,390
So the first sub list will be uh, first sub list will be the sub list which has the uh, which has

393
00:28:01,390 --> 00:28:04,480
the bounding box coordinates, which has the highest confidence score.

394
00:28:04,480 --> 00:28:08,350
So I'm just picking up the first sub list from the my list.

395
00:28:09,970 --> 00:28:16,600
Okay, so then I'm just checking if the, uh, if I just pick the first sub list, and then I will be,

396
00:28:16,600 --> 00:28:22,600
uh, comparing that sub list, uh, with the other uh uh uh list as well.

397
00:28:22,600 --> 00:28:24,970
So the first sub list will have the bounding box coordinates.

398
00:28:24,970 --> 00:28:28,180
And these are the bounding box coordinates, which has the highest confidence score.

399
00:28:28,180 --> 00:28:31,990
So I am just taking the bounding box coordinates with the highest confidence score.

400
00:28:31,990 --> 00:28:36,130
And I am just comparing the bounding box coordinates with the highest confidence score.

401
00:28:36,130 --> 00:28:41,950
Will with the, with the all the other bounding boxes which has the lowest confidence score for the

402
00:28:41,950 --> 00:28:42,910
same class.

403
00:28:42,910 --> 00:28:46,240
So then we calculate the uh intersection over union.

404
00:28:46,240 --> 00:28:52,810
And if I just get the IOU value above this IOU threshold, then the bounding box coordinates with the

405
00:28:52,810 --> 00:28:55,210
lowest confidence score will be removed.

406
00:28:55,210 --> 00:28:55,720
Okay.

407
00:28:57,280 --> 00:28:59,650
So let's run this cell over here.

408
00:29:02,690 --> 00:29:07,190
So now you can see that after blank non-expression, we get this.

409
00:29:08,840 --> 00:29:15,440
Okay, so now you can see that we have just getting the, uh, best fit bounding box coordinates for

410
00:29:15,440 --> 00:29:18,500
each of the subject for the person as well as for the dog.

411
00:29:18,500 --> 00:29:23,180
So all the overlapping bounding box coordinates have been removed and we are just getting the best fit

412
00:29:23,180 --> 00:29:27,260
coordinate reporting box coordinates for that dog, as well as for the person.

413
00:29:27,260 --> 00:29:29,090
Like you can see over here.

414
00:29:29,090 --> 00:29:34,640
There is one bounding box for the dog, and there is one bounding box for the person as well.

415
00:29:34,640 --> 00:29:40,160
And these are the bounding box coordinates which have the highest point transfer among the overlapping

416
00:29:40,160 --> 00:29:41,090
bounding boxes.

417
00:29:41,090 --> 00:29:46,190
So these bounding boxes have the highest score among the overlapping bounding boxes.

418
00:29:46,340 --> 00:29:47,810
So that's all from this tutorial.

419
00:29:47,810 --> 00:29:48,890
Thank you for watching.
