1
00:00:00,540 --> 00:00:06,870
Hi and welcome back to the course in this lesson, we'll make a very cool OCR capture of.

2
00:00:08,010 --> 00:00:12,570
So, you know, those will be you'll see it, you know, well, with characters they put to prove that

3
00:00:12,570 --> 00:00:17,760
you're a human and they're all messed up and have obscurities and occlusions on them.

4
00:00:18,360 --> 00:00:19,430
Well, that's a capture.

5
00:00:19,530 --> 00:00:25,440
In case you didn't realize and we're going to make a computer vision model or algorithm that can crack

6
00:00:25,440 --> 00:00:29,670
those basically cracking, I mean, figure out what the text is in them.

7
00:00:30,270 --> 00:00:31,360
So it's pretty cool.

8
00:00:31,380 --> 00:00:32,820
So let's get started.

9
00:00:32,850 --> 00:00:35,580
So open notebook 72 and we'll begin.

10
00:00:35,580 --> 00:00:41,650
And I will say this is again a notebook from the official Keros tutorial it's prepared by.

11
00:00:41,670 --> 00:00:44,100
It came in and we will begin.

12
00:00:44,280 --> 00:00:49,530
So first thing that's important are libraries, and now we lure them a capture image datasets.

13
00:00:49,530 --> 00:00:52,470
So to capture image dataset is actually quite simple.

14
00:00:52,980 --> 00:00:56,430
So if you take a look at it here, it's already loaded.

15
00:00:57,060 --> 00:01:04,290
You can see, let's wait for this to load the file names so you can see the file name itself has to

16
00:01:04,290 --> 00:01:05,550
capture decoded already.

17
00:01:05,970 --> 00:01:06,450
So that's sort.

18
00:01:06,450 --> 00:01:15,420
The labels are encoded here, so you can see 05:40, and this one is twenty to five and and so on.

19
00:01:15,420 --> 00:01:16,470
You can see that's it.

20
00:01:16,590 --> 00:01:17,550
That's how it's encoded.

21
00:01:18,030 --> 00:01:24,870
So it's pretty easy and good way to encode OCR data into an image without having a separate labels file.

22
00:01:24,870 --> 00:01:26,010
They just use the final names.

23
00:01:26,970 --> 00:01:28,560
So we looked at data set there.

24
00:01:29,100 --> 00:01:33,510
We just display some things about the image and characters are there.

25
00:01:33,510 --> 00:01:36,570
So we have 19 unique characteristics of the characters present.

26
00:01:37,140 --> 00:01:41,040
These are the number of images and labels found, so everything seems to check out.

27
00:01:41,430 --> 00:01:44,790
These is all image dimensions that we'll be using.

28
00:01:45,450 --> 00:01:48,690
So next we move on to pre-processing part of it.

29
00:01:49,200 --> 00:01:55,250
So this part here you basically have a function called character to No and number two characters.

30
00:01:55,260 --> 00:01:58,200
It maps the images back to the characters and vice versa.

31
00:01:58,650 --> 00:02:05,400
We next split the bits here and we get to extrait and validation datasets right there.

32
00:02:05,400 --> 00:02:07,370
And now there's a function here.

33
00:02:07,380 --> 00:02:13,770
We just step two of processes just to input well and could a single sample so we can pass that image

34
00:02:13,770 --> 00:02:14,730
to the CNN.

35
00:02:15,660 --> 00:02:21,090
That's actually just a CNN with CDC loss, but we'll get a little store that we'll talk about that shortly.

36
00:02:21,600 --> 00:02:24,870
So now let's create our dataset fully here.

37
00:02:24,870 --> 00:02:29,010
So we just have our dataset of pipeline functions here.

38
00:02:29,010 --> 00:02:36,830
So we just load the data and create our validation and entry and the training dataset of its creators.

39
00:02:37,020 --> 00:02:42,030
So now we can just visualize some of the data which you've seen previously when I showed you when I

40
00:02:42,030 --> 00:02:46,450
open up the files individually so you can see what some of them look like.

41
00:02:47,110 --> 00:02:48,540
And now we create a model.

42
00:02:48,990 --> 00:02:55,740
So the model has something that we call CTC layer, and that's a CTC lost layer and that basically considers

43
00:02:55,740 --> 00:02:57,990
the temporal and is a temporal.

44
00:02:57,990 --> 00:03:00,300
I mean this left to right.

45
00:03:03,370 --> 00:03:05,000
Sequence in this image here.

46
00:03:05,590 --> 00:03:11,830
So it basically looks at the different image features each sequence as it goes left to right and if

47
00:03:12,190 --> 00:03:18,430
if it's easy to predict if the characters there or not and then if it knows a character's there, then

48
00:03:18,430 --> 00:03:20,110
it tries to predict what a character is.

49
00:03:20,110 --> 00:03:27,160
If it passes over the entire sequence here so the pieces are, then let's see some type of sequential

50
00:03:27,310 --> 00:03:28,180
temporal input.

51
00:03:28,180 --> 00:03:31,360
You can see it here, the other Elysium to create that.

52
00:03:31,370 --> 00:03:35,130
So CTC loss is a weird connection.

53
00:03:35,130 --> 00:03:38,140
This temporal loss I believe it stands for.

54
00:03:38,260 --> 00:03:45,550
I'm not entirely sure I got that right, but it's a very, very cool way to encapsulate that relationship.

55
00:03:45,550 --> 00:03:52,180
That temporal relationship in a left or right rhythm sequence that is natural for OCR lab tests.

56
00:03:52,450 --> 00:03:57,250
So the model is quite small, as you can see just 400000 parameters, but it's quite powerful.

57
00:03:57,850 --> 00:04:04,120
So now we trained the network here, and it doesn't take too long to train does about four or five minutes.

58
00:04:04,780 --> 00:04:09,070
And then once it finishes training, you can actually run some inferences on it here.

59
00:04:09,520 --> 00:04:16,000
So this is a block of code that allows us to run an inference and predict it so we can just take a look

60
00:04:16,000 --> 00:04:17,380
at some predictions here.

61
00:04:17,830 --> 00:04:25,780
So you can see this gets it totally right and an x twenty five to twenty seven, six and four eight

62
00:04:25,780 --> 00:04:28,570
and it see this what works as well?

63
00:04:29,170 --> 00:04:30,610
This one is 44 x.

64
00:04:30,610 --> 00:04:33,460
It all seemed to be right, which is actually quite good.

65
00:04:33,460 --> 00:04:34,300
It's remarkable.

66
00:04:34,810 --> 00:04:40,510
I was trying to find one that isn't right right now, but I haven't actually found any.

67
00:04:41,230 --> 00:04:46,240
If you do, maybe maybe when you run in that book, we'll get another different sample, hopefully.

68
00:04:46,270 --> 00:04:47,530
I'm not sure if it's random or not.

69
00:04:47,530 --> 00:04:53,680
I haven't checked detailed in the detail in the code, but it most likely it might be random.

70
00:04:53,770 --> 00:04:55,420
So you might get different samples here.

71
00:04:55,930 --> 00:05:02,230
But these are all 100 percent correct, which is quite good for such a simple network that we've created.

72
00:05:02,440 --> 00:05:03,550
So it's a genius.

73
00:05:04,060 --> 00:05:05,950
So that's it for this lesson.

74
00:05:05,950 --> 00:05:12,940
In the next section, we'll take a look at the flask using a flask to create a computer vision API.

75
00:05:13,420 --> 00:05:15,490
So that's actually a very valuable lesson.

76
00:05:15,490 --> 00:05:20,950
And this is a nice demo project to get you started with understanding how you can create your own computer

77
00:05:20,950 --> 00:05:24,220
vision API, so I'll see you in the next section.

78
00:05:24,280 --> 00:05:24,730
Thank you.