1
00:00:00,180 --> 00:00:06,930
Hello, everyone, and welcome to this new and exciting session in which we shall focus on preparing

2
00:00:06,930 --> 00:00:11,850
our Pascal VOC dataset using the TensorFlow dataset pipeline.

3
00:00:12,450 --> 00:00:15,450
So here on Kaggle we have this Pascal Vos.

4
00:00:15,450 --> 00:00:22,410
He did a set which is made available by one hand China, and it's made up of this five different directories

5
00:00:22,410 --> 00:00:28,350
that is annotations, image sets, JPEG images, segmentation class and segmentation object.

6
00:00:28,380 --> 00:00:34,770
Nonetheless, we shall be making use of the JPEG images and annotations for our object detection problem.

7
00:00:34,890 --> 00:00:36,090
And now get into the code.

8
00:00:36,090 --> 00:00:38,510
We are going to start by installing Kaggle.

9
00:00:38,520 --> 00:00:44,460
We are going to copy this Kaggle, the JSON file into this directory which we just created.

10
00:00:44,490 --> 00:00:50,340
Now note that this cargo, the JSON file as we've seen already, is gotten from our Kaggle account.

11
00:00:50,340 --> 00:00:58,290
So you get this from your Google account and you copy out your and then now after this copy, you change

12
00:00:58,290 --> 00:01:03,930
the access mode of the file and then we can start with the data set downloading.

13
00:01:03,930 --> 00:01:10,710
Now to download this or to have that command, you just simply scroll like this, get your copy API

14
00:01:10,710 --> 00:01:14,070
command and you paste it out here.

15
00:01:14,070 --> 00:01:17,570
So what we have here is simply what we've copied.

16
00:01:17,580 --> 00:01:18,270
So that's it.

17
00:01:18,270 --> 00:01:20,430
That's how we download this data set.

18
00:01:20,430 --> 00:01:22,650
So we run that and we download the dataset.

19
00:01:22,740 --> 00:01:31,650
Now, once we download that, we'll go ahead to unzip the content of that dataset into our dataset directory.

20
00:01:31,950 --> 00:01:35,070
Right now we have our data set.

21
00:01:35,370 --> 00:01:43,200
There we go, we have our data set, we pick out this one here and you see we have the annotations,

22
00:01:43,200 --> 00:01:46,860
image sets, JPEG, segmentation, class and segmentation object.

23
00:01:46,860 --> 00:01:48,780
You open up this and this.

24
00:01:48,930 --> 00:01:50,050
Okay, so that's it.

25
00:01:50,070 --> 00:01:55,230
Now what we'll do is we are going to define some variables here.

26
00:01:55,230 --> 00:02:00,120
We have our trained images, which is simply this path to this JPEG images.

27
00:02:00,120 --> 00:02:04,830
We have our trained maps, which is simply the path to the annotations.

28
00:02:04,830 --> 00:02:06,090
So that's it.

29
00:02:06,090 --> 00:02:07,950
And then we have our classes.

30
00:02:07,950 --> 00:02:15,620
So Pascal, you see data set takes in 20 classes from airplane right up to TV monitor.

31
00:02:15,630 --> 00:02:17,950
Then we have be set to two.

32
00:02:17,970 --> 00:02:25,710
Now, to understand the significance of this, be remember from the paper that we had seen that the

33
00:02:26,370 --> 00:02:34,860
image is divided into an SSE by SW grid and for each grid cell and each grid cell predicts B bounding

34
00:02:34,860 --> 00:02:40,790
boxes Now yeah the defined as 2b7 and then B to be two.

35
00:02:40,800 --> 00:02:47,790
So that's this number of bounding boxes which we are considering to be B and that's exactly what we

36
00:02:47,790 --> 00:02:48,580
have here.

37
00:02:48,600 --> 00:02:53,910
So that said, we have a number of classes which is simply from year 20.

38
00:02:53,910 --> 00:02:59,460
We have the image height and width which is considered to be 224.

39
00:02:59,460 --> 00:03:03,900
We have the split size SSE, which is 224 divided by 22.

40
00:03:03,900 --> 00:03:04,920
That is equal seven.

41
00:03:04,920 --> 00:03:07,770
We've just seen that this is actually SSE in the paper.

42
00:03:07,770 --> 00:03:15,870
We have a number of epochs or 100 learning rate defined, although we have some sort of learning rescheduling.

43
00:03:15,870 --> 00:03:23,400
So we could take this off and here we could say 135, then we have a batch size of 32.

44
00:03:23,400 --> 00:03:24,390
So that's it.

45
00:03:24,390 --> 00:03:31,030
We define all this and then we move to the processing of our annotations.

46
00:03:31,050 --> 00:03:37,590
Now, given that this annotations are essentially this XML files we have here, what we are going to

47
00:03:37,590 --> 00:03:46,590
be using is this element tree from this XML package, which we'll use to parse this XML data right here.

48
00:03:46,590 --> 00:03:55,050
So diving into the code, we could see here we have the file name which is passed into this pass method

49
00:03:55,050 --> 00:03:57,090
from which we obtain a tree.

50
00:03:57,090 --> 00:04:01,320
Then from the tree we could get its root.

51
00:04:01,320 --> 00:04:04,440
Once we obtain the root, we can now get the tree size.

52
00:04:04,470 --> 00:04:08,130
See, we have root dot fine and we specify size.

53
00:04:08,130 --> 00:04:16,250
And then once we have this tree size, we could get the height of a specific image and its width.

54
00:04:16,260 --> 00:04:20,720
So here we have width, width and height.

55
00:04:20,730 --> 00:04:26,160
What is how we obtain this tool from the size stack So we could also obtain the depth from Europe.

56
00:04:26,400 --> 00:04:30,630
Let's copy this and paste out here.

57
00:04:30,630 --> 00:04:34,110
And then let's say we want to get the depth here.

58
00:04:34,110 --> 00:04:39,570
We specify depth and then you see we obtained the text.

59
00:04:39,570 --> 00:04:43,050
So let's run that and then get the depth.

60
00:04:43,080 --> 00:04:53,430
We run that and then let's have pre process XML and then the file name, it's train maps.

61
00:04:53,430 --> 00:04:57,300
Actually it's actually the file path train maps.

62
00:04:57,300 --> 00:04:59,970
Plus we have this file.

63
00:05:00,120 --> 00:05:00,690
Or

64
00:05:00,690 --> 00:05:06,840
7020700033

65
00:05:07,320 --> 00:05:07,940
XML.

66
00:05:07,990 --> 00:05:10,320
Okay, so it's actually this exact file here.

67
00:05:10,350 --> 00:05:10,950
See this?

68
00:05:10,950 --> 00:05:16,080
If you look up here or if you look at this here file name is actually in file name.

69
00:05:16,080 --> 00:05:18,720
So let's run this and then see what we get.

70
00:05:20,220 --> 00:05:21,090
There we go.

71
00:05:21,090 --> 00:05:21,660
We should have.

72
00:05:21,660 --> 00:05:24,570
Okay, so we have 366 for the height.

73
00:05:24,610 --> 00:05:27,420
So here we have 500 for the width.

74
00:05:27,720 --> 00:05:29,780
And then we have three for the depth.

75
00:05:29,790 --> 00:05:32,040
We've converted all this into floats.

76
00:05:32,880 --> 00:05:42,630
Okay, so now we done with obtaining the images width and height, which are all in this size tag.

77
00:05:42,990 --> 00:05:47,910
Let's now move on to obtaining the different bounding boxes of the different objects.

78
00:05:47,910 --> 00:05:51,600
So you see here we had route dot find size.

79
00:05:51,610 --> 00:05:55,730
That's because we have a single size for that image.

80
00:05:55,740 --> 00:06:03,000
Now here you see we have route dot, find all objects and that's because we will have many or we could

81
00:06:03,000 --> 00:06:05,610
have many objects for a single image.

82
00:06:05,610 --> 00:06:12,120
You're again single size, but you're getting all a possibility of having more than one object.

83
00:06:12,120 --> 00:06:14,430
So that's why we specify find all.

84
00:06:14,430 --> 00:06:20,970
So we are going to find all objects, meaning that we are going to get into each and every object tag

85
00:06:20,970 --> 00:06:21,540
we have here.

86
00:06:21,540 --> 00:06:23,090
You see, this is an object.

87
00:06:23,100 --> 00:06:25,110
This is another object.

88
00:06:25,950 --> 00:06:29,010
If we scroll down, we'll see we have another object.

89
00:06:29,010 --> 00:06:34,890
So essentially in this image, we have one, two, three objects.

90
00:06:35,130 --> 00:06:37,920
Okay, So we have this three different objects.

91
00:06:38,580 --> 00:06:45,660
And now for each and every bounding box in this object, like this is our object here.

92
00:06:45,660 --> 00:06:48,330
If we pick out a specific object.

93
00:06:48,330 --> 00:06:49,770
So here we pick out this object.

94
00:06:49,770 --> 00:06:58,800
For example, we have we go through each and every bounding box in this object and we take its mean.

95
00:06:58,800 --> 00:07:03,240
Does X mean y, mean x, max and Y and max?

96
00:07:03,240 --> 00:07:04,410
So that's exactly what we do here.

97
00:07:04,410 --> 00:07:09,450
You see we have bounding box dot find now X mean and then we convert that to text.

98
00:07:09,450 --> 00:07:13,230
We have y mean text x max and y max.

99
00:07:13,230 --> 00:07:16,770
And at the end of this we now convert this into a float.

100
00:07:16,770 --> 00:07:20,980
So that's how we obtain x mean y, mean x max and y max.

101
00:07:21,000 --> 00:07:26,070
Now you'll notice that we have a break here, and the reason why we want to have this is because for

102
00:07:26,070 --> 00:07:30,270
a particular object we just need a single band and box.

103
00:07:30,270 --> 00:07:32,730
So if we have other bounding boxes.

104
00:07:33,670 --> 00:07:36,130
We are not going to take those into consideration.

105
00:07:37,180 --> 00:07:37,370
Okay.

106
00:07:37,420 --> 00:07:48,490
So that said, now what we'll do is we're going to print out X mean y, mean x, max, y max and yeah,

107
00:07:48,490 --> 00:07:49,300
x, max, max.

108
00:07:49,300 --> 00:07:53,920
Okay, so let's print this out for each and every object.

109
00:07:53,950 --> 00:07:58,120
Now let's run this and see what we get as we expect.

110
00:07:58,120 --> 00:08:03,010
You see, we have nine 107 for 99 to 63.

111
00:08:03,010 --> 00:08:04,420
And that's exactly what we have here.

112
00:08:04,420 --> 00:08:08,080
We have for 21, 200 for 82 to 26.

113
00:08:08,080 --> 00:08:09,340
And then finally we have this.

114
00:08:09,340 --> 00:08:11,830
So here are our three different objects.

115
00:08:12,040 --> 00:08:14,920
Now, what if we try out another different image?

116
00:08:14,920 --> 00:08:17,200
So let's change this to 32.

117
00:08:17,230 --> 00:08:19,570
We run that and see what we get.

118
00:08:19,570 --> 00:08:21,300
Yes, it's on XML files.

119
00:08:21,310 --> 00:08:24,010
Here's 32 instead of 33.

120
00:08:24,010 --> 00:08:26,920
You see, now we have actually four different objects.

121
00:08:26,920 --> 00:08:30,340
And so we have this four different bounding boxes, which you could see here.

122
00:08:30,340 --> 00:08:34,600
We have object, object, object and object.

123
00:08:35,700 --> 00:08:40,380
Now, if you consider this image here, you see we put police course on this point.

124
00:08:40,380 --> 00:08:45,480
We have you could read from here we have 24 188.

125
00:08:45,480 --> 00:08:55,330
So it matches up with this here, see this 26 189 And then here we have 46 to 40 matches are with 44

126
00:08:55,330 --> 00:08:56,620
to 38.

127
00:08:56,640 --> 00:09:04,350
Now, in order for us to make it easier when working with the yellow encoders, what we are going to

128
00:09:04,350 --> 00:09:08,610
do is we are going to get the center of this band and box.

129
00:09:08,610 --> 00:09:11,170
So the center should be around this point here.

130
00:09:11,190 --> 00:09:15,810
Now that center is about 35 215.

131
00:09:15,810 --> 00:09:18,090
So here we have 30.

132
00:09:18,120 --> 00:09:20,130
Oops, let's get a pan.

133
00:09:20,340 --> 00:09:27,630
So here is about 35 to 115.

134
00:09:28,500 --> 00:09:32,700
And then now we've got in the center, we could also get the width.

135
00:09:32,700 --> 00:09:46,470
The width is about 18 and then the height, the height is about 238 -189.

136
00:09:47,160 --> 00:09:48,420
That's 49.

137
00:09:48,420 --> 00:09:55,920
So we now have the center, which is this, we have the width and then we have the height.

138
00:09:55,920 --> 00:10:04,710
And then what we'll do is we'll divide all this by the total width and total height of the image.

139
00:10:04,710 --> 00:10:08,370
So for 35 we take 35 divided by the total width.

140
00:10:08,370 --> 00:10:11,790
The total width of this is 500.

141
00:10:11,790 --> 00:10:20,510
So we have 35 divided by 500 and then we'll have 215 divided by four.

142
00:10:20,580 --> 00:10:31,020
Well, this 215 is divided by the height that is 281 because this, this year is our x coordinate and

143
00:10:31,020 --> 00:10:33,390
then this is our Y coordinate.

144
00:10:33,390 --> 00:10:36,540
So this is respect to the width and then this respect to the height.

145
00:10:36,540 --> 00:10:41,670
So this is divided by 230 281.

146
00:10:41,850 --> 00:10:42,400
Okay.

147
00:10:42,450 --> 00:10:50,190
So we take this divided by 500 and then this divided by 281, then we have the width, which is 18 divided

148
00:10:50,190 --> 00:10:56,400
by 500, and then we have the height 49 divided by to 81.

149
00:10:56,400 --> 00:11:03,810
And so instead of having X mean y min, x max, y max, we have the center which is divided by oh,

150
00:11:03,810 --> 00:11:05,160
which is normalized.

151
00:11:05,160 --> 00:11:08,790
And then we have the width and the height, which are also normalized.

152
00:11:09,270 --> 00:11:14,670
Now, putting this in the form of code, once we're done with getting a specific bond and box, we could

153
00:11:14,670 --> 00:11:16,980
go ahead and obtain the class name.

154
00:11:16,980 --> 00:11:23,250
So you see we have, as usual or make use of our object tree and then we find the name here.

155
00:11:23,250 --> 00:11:25,560
Here we have name and here airplane.

156
00:11:25,560 --> 00:11:28,500
Here's our plane, yours person, yours person.

157
00:11:28,500 --> 00:11:33,030
We're not going to be interested in the polls or what is truncated or not or what is difficult or not.

158
00:11:33,060 --> 00:11:37,440
We're just interested in the name and this bound and box, just as we had seen already.

159
00:11:38,280 --> 00:11:42,030
That said, we have our class name.

160
00:11:42,620 --> 00:11:49,620
From this class name, we could create this class dictionary, which we'll use to convert the different

161
00:11:49,620 --> 00:11:52,710
class names into a specific integer.

162
00:11:52,710 --> 00:12:00,340
So what the simply means is we're going to convert airplane to zero, bicycle to one bird to two, boat

163
00:12:00,360 --> 00:12:02,430
to three, and so on and so forth.

164
00:12:02,430 --> 00:12:10,740
So this is zero one, two, three, four, five, six, seven, eight, nine, ten.

165
00:12:12,060 --> 00:12:13,470
Should be ten.

166
00:12:13,470 --> 00:12:16,290
Yeah, 11, 12, 13, 14.

167
00:12:16,290 --> 00:12:16,650
Okay.

168
00:12:16,680 --> 00:12:19,710
So we have airplane which is zero and person which is 14.

169
00:12:19,710 --> 00:12:21,090
Okay, let's take note of that.

170
00:12:21,120 --> 00:12:29,610
Now getting back here, you see we're going to make use of this dictionary where we simply have a class

171
00:12:29,610 --> 00:12:32,040
and that's converted into an integer.

172
00:12:32,040 --> 00:12:33,840
So that's quite straightforward.

173
00:12:33,840 --> 00:12:40,170
And then now we have our bond and box, which is essentially X mean plus x max divided by two.

174
00:12:40,170 --> 00:12:46,260
That is, we get the center, this is the center, and then we divide by the width.

175
00:12:46,920 --> 00:12:47,790
So that's it.

176
00:12:47,820 --> 00:12:53,850
We have x min plus x max divided by two times width is essentially the center divided by the width.

177
00:12:53,850 --> 00:12:59,370
And then we have the Y center that's women plus Y and max divided by two.

178
00:12:59,370 --> 00:13:02,370
That's the center and then divided by the height.

179
00:13:02,370 --> 00:13:08,490
And then for the width we want x max minus x mean because to obtain the width, to obtain this width

180
00:13:08,490 --> 00:13:11,250
you simply take this, minus this.

181
00:13:11,250 --> 00:13:15,540
To obtain the height, we take this, minus this, and that's it.

182
00:13:15,540 --> 00:13:19,350
So that's how we obtain the width and the height.

183
00:13:19,350 --> 00:13:26,400
So here we have x max minus x mean then divided by the width, and then we have x, y, max minus y,

184
00:13:26,400 --> 00:13:27,570
mean divided by the height.

185
00:13:27,570 --> 00:13:35,220
And then we have our class, which is going to be an integer instead of, say, person or airplane.

186
00:13:35,600 --> 00:13:40,240
Then once we are done with this bonding box, we now store this in the bond boxes list.

187
00:13:40,250 --> 00:13:45,770
So let's create your bond and bond and boxes list.

188
00:13:45,800 --> 00:13:46,680
There we go.

189
00:13:46,700 --> 00:13:48,320
We have that bond and boxes list.

190
00:13:48,320 --> 00:13:51,950
And then now we will return.

191
00:13:52,610 --> 00:13:58,070
Oops, We'll return bond in boxes.

192
00:13:58,370 --> 00:13:59,550
So that's it.

193
00:13:59,570 --> 00:14:01,130
Let's return that.

194
00:14:01,130 --> 00:14:02,530
And then there we go.

195
00:14:02,540 --> 00:14:05,090
So let's run this and then see what we get.

196
00:14:05,510 --> 00:14:09,320
Now, first thing you can notice is that we have our four bounding boxes.

197
00:14:09,350 --> 00:14:15,260
Now, take note of the fact that we have the classes zero zero and 14 million that we have airplane

198
00:14:15,260 --> 00:14:18,260
and person, which matches exactly what we expect.

199
00:14:18,260 --> 00:14:26,690
And then when we get back to this image, you see 35 divide by 502, 50 divided by 281 we should have

200
00:14:26,690 --> 00:14:34,850
35 divided by 500 and then 215 divided by 281.

201
00:14:35,870 --> 00:14:39,500
So you see we have 0.07 and 0.76.

202
00:14:39,500 --> 00:14:39,910
Okay.

203
00:14:39,980 --> 00:14:41,330
So does it make sense?

204
00:14:41,330 --> 00:14:42,710
And then for the width.

205
00:14:42,710 --> 00:14:47,480
For the width, we had 18 divided by 549 divided by 281.

206
00:14:47,480 --> 00:14:51,350
So yeah, we have 18 divided by 500.

207
00:14:51,620 --> 00:15:03,050
And then 200 divided by 281 0.04 or 0.036 and 0.170.71.

208
00:15:03,050 --> 00:15:04,280
This should be seven one.

209
00:15:05,740 --> 00:15:06,830
Oh, let's get back here.

210
00:15:06,860 --> 00:15:10,750
It's actually 49 divided by 281 and not 200.

211
00:15:10,750 --> 00:15:14,320
So this would be 49 because the height is 49.

212
00:15:14,320 --> 00:15:17,710
So we divide that and then you see we should have .17.

213
00:15:17,710 --> 00:15:19,390
Okay, So that makes sense.

214
00:15:19,390 --> 00:15:20,530
So that is it.

215
00:15:20,530 --> 00:15:30,670
We have our encoded bounding boxes and now we're ready to produce our outputs based on what was described

216
00:15:30,670 --> 00:15:31,680
in the paper.

217
00:15:31,690 --> 00:15:44,230
So in the paper we saw that our output will be this seven by seven by 30 tensor, where each and every

218
00:15:44,230 --> 00:15:49,510
cell we have here, each and every one of those 49 different cells, because we have seven times seven,

219
00:15:49,510 --> 00:15:55,720
that 49 will take values depending on whether they have an object or not.

220
00:15:56,470 --> 00:16:03,130
Now, for a show like this one that's actually matching up with this one where there is no object,

221
00:16:03,130 --> 00:16:09,700
we will take values like, oh, would have a value of zero for the object ness, meaning that there

222
00:16:09,700 --> 00:16:10,900
is no object.

223
00:16:10,900 --> 00:16:17,890
And then for the positioning, we have this here that's four zeros.

224
00:16:17,890 --> 00:16:22,480
And then for the class, because there is no object, we will have all zeros.

225
00:16:22,480 --> 00:16:27,220
Now we have 20 classes, so we'll go from zero.

226
00:16:27,490 --> 00:16:30,340
I will have 20 of this zeros.

227
00:16:30,640 --> 00:16:31,600
And there we go.

228
00:16:31,600 --> 00:16:35,800
So you see, we have 20 of this zeros.

229
00:16:35,800 --> 00:16:41,390
Now we're going to have the same for each and every grid cell where we do not have an object.

230
00:16:41,410 --> 00:16:48,280
Now, it should be noted that a grid cell like this one, for example, this one here, let's change

231
00:16:48,280 --> 00:16:50,230
the color, this grid cellular.

232
00:16:50,740 --> 00:16:52,980
Let's take this well, let's get back.

233
00:16:54,040 --> 00:17:02,710
This grid cell here has actually no object because we consider a grid cell to have an object.

234
00:17:02,710 --> 00:17:05,660
If the center of that object is in the grid cell.

235
00:17:05,680 --> 00:17:13,540
Now, although we have the wing of the plane in this cell, given that the center of this plane isn't

236
00:17:13,540 --> 00:17:17,380
in this cell, we do not consider that this cell has an object.

237
00:17:17,380 --> 00:17:21,280
So here would have exact same values we have here.

238
00:17:21,280 --> 00:17:25,840
So there is no object like this is zero one, two, three.

239
00:17:25,840 --> 00:17:26,800
This is zero one.

240
00:17:26,800 --> 00:17:29,440
So we go 012, 301.

241
00:17:29,440 --> 00:17:34,150
So here would have this exact same values, all zeros.

242
00:17:34,150 --> 00:17:41,760
And with this, with this, with this, this and all this other cells.

243
00:17:41,770 --> 00:17:48,670
Now, what we left with will be this cell which contains the center of this object, this cell which

244
00:17:48,670 --> 00:17:54,250
contains the center of this object, this cell which contains the center and this cell.

245
00:17:54,250 --> 00:18:00,940
So we have four cells which contain the center or the centers of this four different objects, while

246
00:18:00,940 --> 00:18:05,890
the other cells contain no object.

247
00:18:05,890 --> 00:18:13,540
Now, we shall focus only on this one here so you understand how this outputs are generated based on

248
00:18:13,540 --> 00:18:14,950
the band and boxes.

249
00:18:14,950 --> 00:18:16,060
And this length.

250
00:18:16,360 --> 00:18:19,600
This length here is essentially the number of bounding boxes.

251
00:18:19,600 --> 00:18:23,170
So it's got it could be gotten from this bounding boxes.

252
00:18:23,170 --> 00:18:29,350
So here we have bounding boxes, which is in this case here for this object is actually what we have

253
00:18:29,350 --> 00:18:36,820
here that is with normalized value such that we have the Center X Center normalized, Y center normalized,

254
00:18:36,820 --> 00:18:39,220
we have the width normalized, the height normalized.

255
00:18:39,220 --> 00:18:47,380
And we have this class now, as we saw in the paper, this isn't exactly what we want.

256
00:18:47,380 --> 00:18:55,480
So what we want is a value which tells us the position of the object with respect to that specific grid

257
00:18:55,540 --> 00:18:56,140
cell.

258
00:18:56,140 --> 00:19:02,860
So if we take off this year, let's take this off and then we specify the center.

259
00:19:03,520 --> 00:19:07,630
The center here is around this position.

260
00:19:07,630 --> 00:19:11,620
Your your source center will find that base.

261
00:19:11,620 --> 00:19:21,130
Now, on the paper, this position here has to be encoded so that we have this value based of this origin.

262
00:19:21,130 --> 00:19:26,890
So it's based off this origin because we have a grid cellular, which is this one.

263
00:19:26,890 --> 00:19:28,360
So we have this grid cell.

264
00:19:28,360 --> 00:19:30,130
It's actually the same grid cell we have here.

265
00:19:30,130 --> 00:19:34,960
So it's base of this and not base of this origin of the whole image.

266
00:19:34,960 --> 00:19:39,550
Remember, the image has this origin and this grid cell has its own origin.

267
00:19:39,580 --> 00:19:40,450
Now, let's shift.

268
00:19:40,450 --> 00:19:42,580
This isn't very clear.

269
00:19:42,580 --> 00:19:44,290
Let's shift this so that we have this full.

270
00:19:44,290 --> 00:19:44,770
Okay.

271
00:19:44,770 --> 00:19:52,180
So you see clearly now the origin of the image, which is this and the origin of this actually this

272
00:19:52,180 --> 00:19:55,240
point here and the origin of the grid cell, which is this point.

273
00:19:55,240 --> 00:20:00,730
And then we have the center of the image, which is around this year, so centers around this.

274
00:20:00,730 --> 00:20:04,780
So the idea now is to obtain this distance from.

275
00:20:04,920 --> 00:20:11,240
This year from well, from this origin to this point, and that's it.

276
00:20:11,250 --> 00:20:17,010
So let's get this distance and this distance.

277
00:20:17,760 --> 00:20:18,480
So that's it.

278
00:20:18,480 --> 00:20:22,660
We need to get the distance from year to year and the distance from year to year.

279
00:20:22,680 --> 00:20:28,980
As of now, we have our center normalized with respect to the whole image.

280
00:20:28,980 --> 00:20:35,490
And if we want to normalize this now with respect to a specific grid cell, we need to take this value

281
00:20:35,490 --> 00:20:38,730
and multiply it by the number of grid cells we have.

282
00:20:38,730 --> 00:20:43,380
So given that we have we have 0.071 will take that and multiply by seven.

283
00:20:43,380 --> 00:20:57,870
So we have 0.07 times seven, which will give us 0.49 and then we'll take 0.750.75 times seven, which

284
00:20:57,870 --> 00:21:01,200
will give us about 5.25.

285
00:21:01,620 --> 00:21:09,510
Now what this means is that the distance from here to this that's in this horizontal direction is 0.49.

286
00:21:09,510 --> 00:21:11,070
That's approximately 0.5.

287
00:21:11,070 --> 00:21:18,060
And this makes sense because the distance from here does this origin to where we have the center here

288
00:21:18,060 --> 00:21:27,810
of this object is approximately half of the distance from here to the full cell and then the distance

289
00:21:27,810 --> 00:21:35,640
from here, this origin going in the horizontal and the vertical direction to this center is approximately

290
00:21:35,640 --> 00:21:36,770
0.25.

291
00:21:36,780 --> 00:21:42,030
So we go about 0.5 and then here about 0.25.

292
00:21:43,170 --> 00:21:50,550
Nonetheless, after multiplying year that 0.07 times seven, we have 0.49 that 0.5.

293
00:21:50,580 --> 00:21:51,480
That makes sense.

294
00:21:51,480 --> 00:21:54,810
We have we also multiply 0.75 times seven.

295
00:21:54,810 --> 00:21:56,760
That gives us 5.25.

296
00:21:56,760 --> 00:22:00,330
But this distance is actually only 0.25.

297
00:22:00,330 --> 00:22:05,430
So what we'll do is we'll take 5.25 modulo

298
00:22:07,440 --> 00:22:15,330
one and will obtain 0.25 because the distance from this origin to this center, the center, this is

299
00:22:15,330 --> 00:22:24,150
the center of the object is 0.5 in this direction, horizontal and 0.25 in the vertical direction.

300
00:22:25,020 --> 00:22:27,990
Now, let's make this bigger so you could see that even clearer.

301
00:22:27,990 --> 00:22:34,980
What we're saying is we have a center which is around this, and then we have this origin here.

302
00:22:34,980 --> 00:22:42,600
The distance from year to year is about 0.5 of our cell.

303
00:22:42,720 --> 00:22:46,980
And then the distance from here downward up to the center.

304
00:22:46,980 --> 00:22:52,200
This distance is about 0.25 of our cell.

305
00:22:52,200 --> 00:22:59,940
And we've seen that to compute this automatically, what we need to do is get this already normalized

306
00:22:59,940 --> 00:23:01,050
values from here.

307
00:23:01,050 --> 00:23:02,970
We already have this normalized values.

308
00:23:02,970 --> 00:23:11,160
We multiply them by seven and then we compute the modulo of those where we find the the output from

309
00:23:11,160 --> 00:23:14,670
your model of one to obtain this distances.

310
00:23:14,670 --> 00:23:27,960
So let's do 0.49 modulo one, you should give you 0.49 and then 5.25 modulo one.

311
00:23:29,010 --> 00:23:30,870
It gives you 0.25.

312
00:23:30,870 --> 00:23:38,400
So you see that now we have this center with respect to this grid cells origin, and that's exactly

313
00:23:38,400 --> 00:23:39,840
what we had in the paper.

314
00:23:40,770 --> 00:23:46,590
Now, diving into the code, you'll see that we are going to create this output level, which is essentially

315
00:23:46,590 --> 00:23:52,050
going to be a seven by seven by number of classes plus five.

316
00:23:52,050 --> 00:23:53,550
That's 25.

317
00:23:53,640 --> 00:23:57,570
The number of classes is 20 by 25 tensor.

318
00:23:58,380 --> 00:24:05,640
And then we are going to go through each and every bounding box we have here and then put in the values

319
00:24:05,640 --> 00:24:08,240
corresponding to the specific cells.

320
00:24:08,250 --> 00:24:14,400
So again, we are seeing that for all these different cells here we have all zeros.

321
00:24:14,400 --> 00:24:24,900
But for this cell, this cell, the cell and this cell, we have non zero values or not all values are

322
00:24:24,900 --> 00:24:25,860
zeros.

323
00:24:25,860 --> 00:24:29,160
So let's concentrate on this one as we said already.

324
00:24:29,160 --> 00:24:32,430
So we have here for being range length.

325
00:24:32,430 --> 00:24:36,270
That's for being the range of the number of bounding boxes.

326
00:24:36,270 --> 00:24:38,490
We say that length is the number of bounding boxes.

327
00:24:38,490 --> 00:24:41,190
We have the bound and box.

328
00:24:41,190 --> 00:24:46,680
That specific bounding box this zero year is simply this.

329
00:24:47,070 --> 00:24:53,640
So this is X center, the center we multiply by the split size, multiply by seven.

330
00:24:53,640 --> 00:24:58,080
So just like taking 0.07 times seven, that will give you 0.49.

331
00:24:58,080 --> 00:25:00,000
So this is actually 0.49.

332
00:25:01,020 --> 00:25:03,630
If we are dealing with this bounding box here.

333
00:25:03,810 --> 00:25:08,430
And then for this next one, it's essentially 5.25.

334
00:25:09,460 --> 00:25:13,540
So this is this times seven will give you 5.25.

335
00:25:13,570 --> 00:25:20,680
Now, one other thing we need to do is we need to pick that specific or the specific grid cell to pick

336
00:25:20,680 --> 00:25:27,460
the specific grid cell out of all the 49 grid cells, because we have seven by seven here, this one,

337
00:25:27,460 --> 00:25:28,750
two, three, four, five, six, seven.

338
00:25:28,750 --> 00:25:30,820
One, two, three, four, five, six, seven.

339
00:25:30,820 --> 00:25:32,410
So we have 49 options.

340
00:25:32,410 --> 00:25:37,750
We need to pick only one option, and that's what we are doing right here.

341
00:25:38,380 --> 00:25:43,540
So to do that, what we need to do here is we take this 0.49.

342
00:25:44,600 --> 00:25:48,170
And then we convert it into an integer.

343
00:25:48,950 --> 00:25:50,840
Or we simply round this down.

344
00:25:50,840 --> 00:25:54,410
So around in 0.49 down we give you zero.

345
00:25:54,650 --> 00:25:58,310
And then round in 5.25 down we give you five.

346
00:25:58,730 --> 00:26:06,410
So it simply means that we are going in the x direction or the direction we add the zero position,

347
00:26:06,410 --> 00:26:08,600
but in the Y direction we go to the field position.

348
00:26:08,600 --> 00:26:12,470
So this is zero one, two, three, four, five.

349
00:26:12,470 --> 00:26:15,740
And then the acceleration, we still add zero.

350
00:26:15,740 --> 00:26:17,870
So that's how we we get this here.

351
00:26:17,870 --> 00:26:25,160
And then so we have we said zero five and then the output level zero five.

352
00:26:25,190 --> 00:26:35,840
Just like remember we had this output which is seven by seven by 25, this is one, two, three, four,

353
00:26:35,840 --> 00:26:40,700
five, six and this other one seven.

354
00:26:41,000 --> 00:26:44,000
And then here we have seven.

355
00:26:44,000 --> 00:26:47,810
So we have seven by seven by 25.

356
00:26:47,810 --> 00:26:56,960
Let's add this year we have this, this, this one, two, three, four, five, six.

357
00:26:57,080 --> 00:26:59,630
And then we have seven.

358
00:27:00,950 --> 00:27:08,210
So essentially what we're saying here is zero five, that's zero.

359
00:27:08,570 --> 00:27:14,330
And then 1012, three, four, five at this position right here.

360
00:27:15,970 --> 00:27:17,920
For the first five values.

361
00:27:17,920 --> 00:27:23,770
For its first five values we would have one once signifies that we have an object.

362
00:27:23,920 --> 00:27:33,940
Then for the positions you see, we have 0.5449 modulo one, which is going to give us 0.49.

363
00:27:35,390 --> 00:27:40,840
Then we will have 5.25 modulo one, which will give us 0.25.

364
00:27:40,850 --> 00:27:45,910
So now we will find its position with respect to this cell.

365
00:27:45,920 --> 00:27:53,060
And then we have the width, we have the width here, which is this bounding box too, because this

366
00:27:53,060 --> 00:27:54,460
is 012.

367
00:27:54,470 --> 00:27:55,870
So this is the width.

368
00:27:55,880 --> 00:28:02,000
But remember from the paper, it's of what we need to have here was a width of this bounding box with

369
00:28:02,000 --> 00:28:03,560
respect to the whole image.

370
00:28:03,560 --> 00:28:07,010
So what we have as value here will stay unmodified.

371
00:28:07,010 --> 00:28:08,090
So that's it.

372
00:28:08,990 --> 00:28:13,580
0.0360.036.

373
00:28:13,730 --> 00:28:18,830
And then for the next value is 0.170.17.

374
00:28:19,040 --> 00:28:22,880
So that's how we obtain the first five values.

375
00:28:22,880 --> 00:28:32,720
You see, we assign this values we have right here now for the classes we have from five right up to

376
00:28:32,990 --> 00:28:33,860
25.

377
00:28:33,860 --> 00:28:37,310
So we have from those five plus this.

378
00:28:37,310 --> 00:28:39,070
We assigned a value of one.

379
00:28:39,080 --> 00:28:46,700
Now to understand why we what we are doing here, remember that this bounding box is or this bounding

380
00:28:46,700 --> 00:28:51,110
box of value or index value for ticks in this value.

381
00:28:51,110 --> 00:28:52,010
14.

382
00:28:52,010 --> 00:28:58,730
So what we are saying is add the position five plus 14 we want to assign a value one.

383
00:28:58,970 --> 00:29:06,110
Now remember that we have this first five values which tells us the object ness or which gives us the

384
00:29:06,110 --> 00:29:12,350
object score, which gives us a position, and then the remaining 20 values would tell us the class

385
00:29:12,350 --> 00:29:13,220
of the object.

386
00:29:13,220 --> 00:29:18,710
So after listing this, we also have the class, but we have 20 zeros, see?

387
00:29:19,310 --> 00:29:22,760
So again, we have this object ness right here.

388
00:29:23,450 --> 00:29:30,950
We have object ness, we have the bounding boxes, and then we have the 20 zeros.

389
00:29:31,190 --> 00:29:37,370
Now, when we say five plus 14, it means that we'll get to the 19th position.

390
00:29:38,510 --> 00:29:49,370
This is five +01, two, three, four, five, six, seven, eight, nine, ten, 11, 12, 13, 14.

391
00:29:49,370 --> 00:29:58,430
So at this position here, we are going to take this off and then replace that with a one.

392
00:29:58,670 --> 00:30:05,120
So here we have a one meaning that this year has an object.

393
00:30:05,600 --> 00:30:11,120
This is bound in box and that object is of class person.

394
00:30:12,160 --> 00:30:16,180
So you see here we assign this value one and that's it.

395
00:30:16,180 --> 00:30:19,330
We output or we return the output level.

396
00:30:19,780 --> 00:30:24,880
Recall that we've just done this for this object, but we have four different objects.

397
00:30:24,880 --> 00:30:27,130
That's four, in fact, four different bounding boxes.

398
00:30:27,130 --> 00:30:32,260
So we'll do for this, we'll do for this, we do for this and we do for this.

399
00:30:32,830 --> 00:30:38,140
That said, at this point, we could take all this off and then carry out some testing.

400
00:30:38,140 --> 00:30:42,760
So we have this pre process XML which outputs this here.

401
00:30:42,760 --> 00:30:54,730
So we could do generate generates output which takes in pre well as copy this here, just copy this

402
00:30:55,150 --> 00:31:02,560
paste our year, the output of this is this bounding boxes and then scroll so we could see that clearly

403
00:31:02,560 --> 00:31:06,430
and then we have the length of that output.

404
00:31:06,880 --> 00:31:08,050
Okay, so that's it.

405
00:31:08,050 --> 00:31:10,510
We have this output and we have the length of the output.

406
00:31:10,510 --> 00:31:13,720
Let's run this and see what we get.

407
00:31:16,010 --> 00:31:17,200
This isn't defined.

408
00:31:17,210 --> 00:31:18,200
Let's run this.

409
00:31:18,200 --> 00:31:19,190
And there we go.

410
00:31:20,060 --> 00:31:22,630
So we should have something reasonable from this.

411
00:31:22,640 --> 00:31:24,670
You see, we have this output.

412
00:31:24,680 --> 00:31:29,150
And if you check this out, you see seven by seven by 25.

413
00:31:29,300 --> 00:31:32,690
And then now what we could do is we could say, okay, one two, get

414
00:31:33,710 --> 00:31:39,200
000.

415
00:31:39,650 --> 00:31:43,010
So zero zero, let's run that, see what we get.

416
00:31:43,760 --> 00:31:44,600
Zero zero.

417
00:31:44,600 --> 00:31:46,790
Does this sell right here?

418
00:31:47,240 --> 00:31:50,450
You see, all is values, all the values of this.

419
00:31:50,450 --> 00:31:52,370
So let's take all this off.

420
00:31:52,520 --> 00:31:57,390
All the values of this seller are zeros, which makes sense.

421
00:31:57,440 --> 00:32:02,480
Now let's do 0505.

422
00:32:02,480 --> 00:32:03,560
We run that.

423
00:32:03,830 --> 00:32:07,130
You see, we have exactly what we expect here.

424
00:32:07,130 --> 00:32:08,780
We have 0.49.

425
00:32:08,780 --> 00:32:15,440
We have 0.31, meaning that the distance from year to the center is about 0.31 year.

426
00:32:15,440 --> 00:32:20,740
We have 0.036, we have 0.174 and then we have a one at this position.

427
00:32:20,750 --> 00:32:21,980
And that as a person.

428
00:32:21,980 --> 00:32:30,620
Now if you go 012, well, this is 012 understand the direction and 012 in the vertical direction.

429
00:32:30,620 --> 00:32:34,520
So this is two two, let's do two two and see what we have.

430
00:32:34,520 --> 00:32:35,690
We have two.

431
00:32:36,980 --> 00:32:40,010
Let's get back to two.

432
00:32:40,460 --> 00:32:42,290
We should have an object there.

433
00:32:42,350 --> 00:32:43,490
See, we have an object.

434
00:32:43,490 --> 00:32:46,340
So we have we have one we have is bounding box.

435
00:32:46,340 --> 00:32:49,280
And then notice how this is this class.

436
00:32:49,280 --> 00:32:50,540
Remember the aeroplane.

437
00:32:50,540 --> 00:32:54,320
The aeroplane, as we had here, was the very first class.

438
00:32:54,320 --> 00:32:59,120
So it makes sense that we have a one at this very first position.

439
00:32:59,120 --> 00:33:00,770
So that makes sense.

440
00:33:00,830 --> 00:33:01,640
So that's it.

441
00:33:01,640 --> 00:33:07,580
We see how we could generate our output from this data set.

442
00:33:07,580 --> 00:33:13,070
We've been given all this XML files we have for each and every image.

443
00:33:13,070 --> 00:33:16,820
Before we move on, we are going to do some slight modifications on the code.

444
00:33:16,820 --> 00:33:19,400
So here we no longer need this length.

445
00:33:19,400 --> 00:33:21,890
We have the bound and boxes.

446
00:33:21,890 --> 00:33:28,220
Then we are going to create this non pi array instead of the TensorFlow variable we had before.

447
00:33:28,220 --> 00:33:31,240
So exact same shape as before.

448
00:33:31,250 --> 00:33:32,780
Still our output label.

449
00:33:32,780 --> 00:33:37,670
And then we'll get the length directly from this bound and boxes.

450
00:33:37,670 --> 00:33:44,390
So we have for being range length of the bounding boxes then to account for the fact that we are going

451
00:33:44,390 --> 00:33:47,450
to have this computations bashed.

452
00:33:47,450 --> 00:33:50,420
We are going to add this three dots right here.

453
00:33:50,420 --> 00:33:56,690
So we have now our grid X computed as similar to what we've just seen, and then we have our grid y

454
00:33:56,690 --> 00:33:59,450
computed still our I J garden.

455
00:33:59,450 --> 00:34:05,870
And then here again we add the three dots to account for the batch computations.

456
00:34:05,870 --> 00:34:08,510
And so that's it, this exact same code.

457
00:34:08,510 --> 00:34:13,280
And then now we're going to convert this non pi array into a tensor.

458
00:34:13,280 --> 00:34:18,170
So we use the convert to tensor method in TensorFlow.

459
00:34:18,170 --> 00:34:18,980
So that's it.

460
00:34:18,980 --> 00:34:23,870
We run this and we still have our same output.

461
00:34:24,440 --> 00:34:30,440
Now to ensure that our validation set doesn't get mixed up with the training, we are going to define

462
00:34:30,440 --> 00:34:37,580
this set of 64 images, which will be our validation set, which will make up our validation set.

463
00:34:37,820 --> 00:34:38,300
There we go.

464
00:34:38,300 --> 00:34:44,780
We have this valid list and then the next thing we'll do is we are going to copy, as you see here,

465
00:34:44,780 --> 00:34:52,220
we're going to copy or rather we are going to move this files into this two directories.

466
00:34:52,220 --> 00:34:58,310
So you're if you open this up now, you'll see we have we've created this directory valid JPEG images,

467
00:34:58,310 --> 00:35:05,150
which is this, and we've created a directory of our annotations, which is this other one which contain

468
00:35:05,150 --> 00:35:10,400
the validation set images and annotations.

469
00:35:10,640 --> 00:35:11,540
And so that's it.

470
00:35:12,050 --> 00:35:14,600
So now we're going to create this different lists.

471
00:35:14,600 --> 00:35:17,480
Here we have the image patch and the XML pads.

472
00:35:17,690 --> 00:35:21,230
We have the validation image pads and the validation XML pads.

473
00:35:22,070 --> 00:35:23,720
Let's run this.

474
00:35:24,200 --> 00:35:33,260
You see, we have 17,061 files for the training and then 64 for the validation, and that's it.

475
00:35:33,890 --> 00:35:37,730
Again, you should know that this training images has already been defined here.

476
00:35:37,730 --> 00:35:40,160
We've defined training images right here.

477
00:35:40,280 --> 00:35:43,670
We've also defined Val images and Val maps.

478
00:35:43,670 --> 00:35:49,790
So that's where we get all this from and that's how we obtain all these different paths.

479
00:35:50,780 --> 00:35:56,720
Now from here, we're going to create the TensorFlow data sets, that is the trained data set and the

480
00:35:56,720 --> 00:36:03,830
validation data set, which is essentially going to be made of these different paths which we've just

481
00:36:03,830 --> 00:36:04,310
created.

482
00:36:04,310 --> 00:36:12,350
So we make use of this from tensor slides, this method, and we put the image pads and the XML pads

483
00:36:12,350 --> 00:36:14,570
and the validation pads and.

484
00:36:14,770 --> 00:36:16,460
Validation XML patch.

485
00:36:16,480 --> 00:36:20,950
So let's run this and then we could visualize our validation data.

486
00:36:20,980 --> 00:36:26,760
So here you see we have this path here that is our image path.

487
00:36:26,770 --> 00:36:31,270
And then you see we have this other path, which is our XML file.

488
00:36:31,270 --> 00:36:37,360
So we have the image and its corresponding XML file.

489
00:36:39,050 --> 00:36:48,950
Then now that we have our image pads and our XML pads already making up our data set from this tool,

490
00:36:48,950 --> 00:36:55,390
we could obtain the image and the bounding boxes for the image.

491
00:36:55,400 --> 00:36:58,480
All we need to do is pass in the image path.

492
00:36:58,490 --> 00:37:10,820
In this read file method, then decode that red file and then we go ahead and resize and cast the image.

493
00:37:10,820 --> 00:37:12,470
So let's take this off.

494
00:37:12,550 --> 00:37:17,120
We actually do the casting here, so no need doing that before resizing.

495
00:37:17,120 --> 00:37:24,770
So as we're saying, we resize and then we carry out a casting and we obtain our image from this image

496
00:37:24,770 --> 00:37:25,410
path.

497
00:37:25,430 --> 00:37:34,220
Now for the bottom boxes, remember we looked at this pre processed XML method which was already explained

498
00:37:34,220 --> 00:37:34,970
here.

499
00:37:34,970 --> 00:37:45,080
We looked at this method which takes in our XML path or our file or file name and then outputs the bounding

500
00:37:45,080 --> 00:37:45,740
boxes.

501
00:37:45,740 --> 00:37:47,660
So that's exactly what we're doing here.

502
00:37:47,660 --> 00:37:49,060
And so that's it.

503
00:37:49,070 --> 00:37:56,330
Now, given that this method does a pre process, XML method isn't made of only TensorFlow operations

504
00:37:56,330 --> 00:38:05,870
will need to make use of this non PI function method where we are going to pass in our function here.

505
00:38:07,020 --> 00:38:12,960
We specify the input, which is the path that's our XML path.

506
00:38:12,960 --> 00:38:21,090
And then we also specify the data type of our output tensor, which in this case is flow 32.

507
00:38:21,120 --> 00:38:26,970
So this lets let's say this is XML path and here we have XML path.

508
00:38:27,000 --> 00:38:31,430
Okay, so now we have the path and we have the method.

509
00:38:31,440 --> 00:38:33,810
We can now obtain the boxes.

510
00:38:35,180 --> 00:38:44,540
And then now we have our trained data set which is going to be redesigned such that we have this year

511
00:38:44,540 --> 00:38:53,090
we have this path's, the image path and the XML path which gets in here and then outputs the images

512
00:38:53,090 --> 00:38:53,960
on the boxes.

513
00:38:53,960 --> 00:39:01,490
So you see train data set, we map and then we specify the method, which is a get image and bounding

514
00:39:01,490 --> 00:39:02,000
boxes.

515
00:39:02,000 --> 00:39:07,550
Thus we get the images and the bounding boxes from the image path and the XML path.

516
00:39:07,550 --> 00:39:14,450
So now our training data set is no longer going to give us the the image path and the XML path, but

517
00:39:14,450 --> 00:39:19,250
it's going to give us the image itself and the bound and boxes.

518
00:39:19,250 --> 00:39:21,800
So let's run this and then see what we get.

519
00:39:22,910 --> 00:39:23,780
There we go.

520
00:39:23,780 --> 00:39:28,910
You can see we have this image and then we have this corresponding amount and box.

521
00:39:29,060 --> 00:39:31,280
In this case, we have just a single bounding box.

522
00:39:32,930 --> 00:39:35,180
Let's go ahead and write this.

523
00:39:35,900 --> 00:39:38,090
Then we check this out here.

524
00:39:39,530 --> 00:39:40,370
There we go.

525
00:39:40,370 --> 00:39:42,380
We have this output you could see.

526
00:39:42,380 --> 00:39:46,790
So you see we have this output here and it's showing that we have an error plane.

527
00:39:46,790 --> 00:39:54,110
So if you check this out, you see we have the bounding box here and then we have the class.

528
00:39:54,110 --> 00:39:55,280
The class is zero.

529
00:39:55,280 --> 00:40:00,860
So if we scroll to the top, you find that the class error plane was here.

530
00:40:00,860 --> 00:40:03,770
So this is the zero class, which makes sense.

531
00:40:03,770 --> 00:40:06,770
Now, let's try out with some others.

532
00:40:06,770 --> 00:40:14,780
Well, most of those ones here have only one object, but there is this one at this eighteenths position.

533
00:40:14,780 --> 00:40:19,550
Let's keep, let's keep that and then a break.

534
00:40:19,550 --> 00:40:22,670
Then we run this again and then see what we get.

535
00:40:23,030 --> 00:40:33,260
So this, this particular image has several objects so we could check that out, run this.

536
00:40:33,260 --> 00:40:37,610
Okay, so you see here we have now many more objects.

537
00:40:38,000 --> 00:40:38,210
Here.

538
00:40:38,210 --> 00:40:39,530
We have many, many people.

539
00:40:39,530 --> 00:40:44,240
Actually you have 14 and this happens to be person.

540
00:40:44,240 --> 00:40:49,220
So if you if you if you have here, here we have this.

541
00:40:49,220 --> 00:40:50,660
Well, okay.

542
00:40:50,660 --> 00:40:52,370
So if you do classes.

543
00:40:52,370 --> 00:40:54,310
Well, the the list is classes.

544
00:40:54,320 --> 00:41:03,830
So let's just do classes 14 and we get the exact class, we have the classes 14 And you should see you

545
00:41:03,830 --> 00:41:05,000
should have person.

546
00:41:05,980 --> 00:41:11,410
Okay, so you see we have one, two, three, four, five, six, seven, eight persons.

547
00:41:11,410 --> 00:41:18,370
And if you get here, you see, we should have one, two, three, four, five, six, seven, eight.

548
00:41:18,400 --> 00:41:19,780
Exactly what we expect.

549
00:41:19,780 --> 00:41:22,710
And then we have ten, eight, eight.

550
00:41:22,720 --> 00:41:29,230
Well, this eight, this eight here and then after we check out ten, eight is cheer.

551
00:41:29,230 --> 00:41:30,670
So maybe it's this cheer.

552
00:41:30,670 --> 00:41:31,600
Okay, it is twice.

553
00:41:31,600 --> 00:41:33,100
So we have a cheer here.

554
00:41:33,100 --> 00:41:38,820
And we also have another cheer here and then ten is let's check that out.

555
00:41:38,830 --> 00:41:39,870
Should be dining table.

556
00:41:39,880 --> 00:41:40,390
Okay.

557
00:41:40,390 --> 00:41:42,310
See, we have this dining table.

558
00:41:42,340 --> 00:41:46,660
We have dining table, we have cheer, cheer and then eight persons.

559
00:41:47,140 --> 00:41:51,050
So now we have the image and it's bound in boxes.

560
00:41:51,070 --> 00:41:55,180
The next thing we want to do is have the image and it's output levels.

561
00:41:55,180 --> 00:42:00,040
Remember, with our generate output method, which we had seen already here, it takes in the output

562
00:42:00,040 --> 00:42:02,410
boxes and outputs the labels.

563
00:42:02,410 --> 00:42:12,220
So let's get right here and then you see we have this pre processed method which takes in again this.

564
00:42:13,400 --> 00:42:15,980
MH And it's bound in boxes.

565
00:42:15,980 --> 00:42:20,500
And then right here we are going to output the image.

566
00:42:20,660 --> 00:42:22,670
Here, we just simply output an image.

567
00:42:22,670 --> 00:42:26,390
But for the bottom boxes, we need to convert this into output labels.

568
00:42:26,390 --> 00:42:32,180
So we make use of generate output method, which itself takes in the bottom boxes and then we specify

569
00:42:32,180 --> 00:42:35,390
the data type of the output tensor.

570
00:42:36,260 --> 00:42:43,790
Again, we're using this non pi function method right here because this generate output isn't made of

571
00:42:43,790 --> 00:42:45,830
only TensorFlow operations.

572
00:42:45,830 --> 00:42:47,240
So that's it.

573
00:42:47,270 --> 00:42:48,650
We'll run this.

574
00:42:48,650 --> 00:42:51,380
And then there we go.

575
00:42:52,250 --> 00:42:58,760
Our final steps now will be to batch our data set and then implement pre fetching.

576
00:42:58,760 --> 00:43:06,530
So let's run that and then we have our training and validation data sets which have been prepared.
