1
00:00:00,300 --> 00:00:06,180
So far in this course, we've been preparing our data with TensorFlow data sets.

2
00:00:06,330 --> 00:00:13,080
Now we are going to look at how to carry out data preparation with data generators.

3
00:00:13,680 --> 00:00:21,450
And this data generator class we have here, which inherits from the sequence class, is made of this

4
00:00:21,450 --> 00:00:26,360
three main methods, though we could also have other methods like on Epoc.

5
00:00:26,370 --> 00:00:34,050
And so let's focus on this one and understand how this could be used in preparing our data.

6
00:00:34,740 --> 00:00:41,880
Now, one thing we should note is the fact that here we have an object detection problem and so we have

7
00:00:41,880 --> 00:00:46,620
the image path and then the XML path.

8
00:00:47,640 --> 00:00:55,620
With this image path, we expect to obtain an output tensor of shape 224 by two, 24 by three.

9
00:00:56,490 --> 00:00:58,170
And then with the XML path.

10
00:00:58,170 --> 00:01:06,750
So we have this with XML path, we expect to have an output tensor of shape seven by seven by 25.

11
00:01:06,750 --> 00:01:09,870
So let's have this by 25.

12
00:01:11,340 --> 00:01:13,260
All let's just write it out explicitly here.

13
00:01:13,260 --> 00:01:16,920
So let's take this off, take this off.

14
00:01:16,920 --> 00:01:18,240
And here we have.

15
00:01:18,570 --> 00:01:21,780
And now we will also include the batch size.

16
00:01:22,170 --> 00:01:24,360
So here we have the batch.

17
00:01:25,500 --> 00:01:31,440
Now, here in this init method, we have the strain image list, which is made of the list of all trained

18
00:01:31,440 --> 00:01:35,040
images from our image paths.

19
00:01:35,040 --> 00:01:40,800
And then we have the strain map list, which is made of the list of all XML files.

20
00:01:40,800 --> 00:01:44,900
So now you see how we obtain the image path and the XML path.

21
00:01:44,910 --> 00:01:52,920
Then we specify the split size number of boxes, number of classes, and the batch size for the length

22
00:01:52,920 --> 00:01:53,460
year.

23
00:01:53,460 --> 00:02:00,770
We are going to define how many times or how many batches of data we are going to work with.

24
00:02:00,780 --> 00:02:07,230
So if the batch size is one, then you work with the length of this image list.

25
00:02:07,230 --> 00:02:16,770
So what this means essentially is if you have 20,000 images, obviously 20,000 images and 20,000 XML

26
00:02:17,580 --> 00:02:25,590
files, then if the batch size is equal one, then the length you return your will be 20,000.

27
00:02:26,280 --> 00:02:34,740
But if the batch size, for example, is is 20, then you have 1000.

28
00:02:37,460 --> 00:02:47,060
As output length, since you would have to complete one epoch by going to the data set 1000 times since

29
00:02:47,060 --> 00:02:48,300
the bar size is 20.

30
00:02:48,320 --> 00:02:54,200
As you've broken out the broken down, the data set into 1000 different parts.

31
00:02:55,040 --> 00:03:03,110
Now finally we have this get item method right here where we are actually going to get this output.

32
00:03:04,310 --> 00:03:06,230
Now we just get it in method.

33
00:03:06,230 --> 00:03:12,620
We call this data generation method, which is going to be in charge of obtaining the output.

34
00:03:12,620 --> 00:03:19,790
So here we have we well, we define x and Y and then you see we have this image we read or we load the

35
00:03:19,790 --> 00:03:20,510
image.

36
00:03:20,660 --> 00:03:26,720
But because each and every time we are having to load this image, we actually load in a batch.

37
00:03:27,380 --> 00:03:30,890
You will notice that we have an ID which is passed here.

38
00:03:31,040 --> 00:03:39,070
Now this IDs go from zero right up to 1000 or 1000 minus one.

39
00:03:39,080 --> 00:03:48,140
Because remember, if we have a batch size of 20 and 20,000 different images, then we are going to

40
00:03:48,140 --> 00:03:50,870
have 1000 batches.

41
00:03:52,340 --> 00:03:56,330
And so for the first batch, for example, we will take the index zero.

42
00:03:56,330 --> 00:03:57,950
The index zero is going to be passed here.

43
00:03:57,950 --> 00:04:01,250
Then we'll go from zero because this is zero times batch size.

44
00:04:01,250 --> 00:04:05,120
So we'll go from 0 to 0 plus one.

45
00:04:05,120 --> 00:04:05,780
That's one.

46
00:04:05,780 --> 00:04:07,250
One times batch size is 20.

47
00:04:07,250 --> 00:04:08,930
So we'll go from 0 to 20.

48
00:04:10,370 --> 00:04:20,480
And then with this for loop will then be able to get the first 20 images and put in this x variable.

49
00:04:21,230 --> 00:04:28,970
Now for the output stats for Y, we are going to call on this generate output method right here which

50
00:04:28,970 --> 00:04:35,960
takes in the bound and boxes which we've gotten after pre processing the XML file and then going out

51
00:04:35,960 --> 00:04:40,700
of this we have X and Y, which is what we expect.

52
00:04:40,700 --> 00:04:47,870
So now define train engine and while GEN which is essentially the training and validation sets where

53
00:04:47,870 --> 00:04:55,820
we have our training images and our training XML parts sent, we specify the split size number of boxes,

54
00:04:55,820 --> 00:04:57,830
classes and the batch size.

55
00:04:58,550 --> 00:05:02,540
And then for the training you just have to put your train generator.

56
00:05:03,170 --> 00:05:04,610
And so that's it for this section.

57
00:05:04,610 --> 00:05:12,470
We've just seen how to create our data set or prepare our data set using TensorFlow data generators.
