1
00:00:00,360 --> 00:00:07,200
Hi and welcome back to lesson on feature actors, and we'll take a look at how each of the doctors help

2
00:00:07,200 --> 00:00:08,640
us classify images.

3
00:00:09,030 --> 00:00:10,410
So let's go into the slides.

4
00:00:11,100 --> 00:00:12,720
So take a look at this image here.

5
00:00:12,840 --> 00:00:13,640
This is a cut.

6
00:00:13,980 --> 00:00:19,410
As as I discussed earlier, we know this is a cut, but how do we know because we instinctively see

7
00:00:19,410 --> 00:00:24,810
the patterns and our brain probably combines, unbeknownst to us, all of those little features we're

8
00:00:24,810 --> 00:00:28,620
seeing in the cut and automatically associates that with a cut.

9
00:00:29,130 --> 00:00:33,750
Just so you know, our brain brain is absolutely incredible at visual processing.

10
00:00:34,230 --> 00:00:40,920
I think roughly two thirds or two to four brain is dedicated to vision and being a being a computer

11
00:00:40,920 --> 00:00:48,030
vision engineer, I can definitely see why vision is such a complicated tests for a computer to do well.

12
00:00:48,660 --> 00:00:50,070
So back to this lesson.

13
00:00:50,670 --> 00:00:56,850
So imagine we have a convolutional filter, and as you saw previously, we sliders filters looking for

14
00:00:56,850 --> 00:00:57,810
specific features.

15
00:00:57,820 --> 00:01:01,770
So imagine we have a fidget us looking for a cat eye.

16
00:01:02,430 --> 00:01:06,390
That's that's going to be a common filter that is looking specifically for that.

17
00:01:07,440 --> 00:01:10,470
So let's take a look just dragged across there.

18
00:01:10,620 --> 00:01:16,320
And this just to illustrate what the coin filters do, and it's going back and forth like that over

19
00:01:16,320 --> 00:01:17,310
and over with that image.

20
00:01:17,880 --> 00:01:23,220
So what what is the what are the convolutional filters look for?

21
00:01:23,760 --> 00:01:27,600
So imagine we had a tree in CNN that was able to detect cuts.

22
00:01:28,020 --> 00:01:29,040
You would have filters.

23
00:01:29,500 --> 00:01:30,660
It would actually.

24
00:01:30,900 --> 00:01:32,370
And this is this is true.

25
00:01:32,850 --> 00:01:38,220
It actually creates filters that correspond to the new features in the image of cats.

26
00:01:38,280 --> 00:01:40,350
So it learns what a cat looks like.

27
00:01:40,920 --> 00:01:41,820
Bit by bit.

28
00:01:41,940 --> 00:01:46,860
So you can get you get features that looks like whiskers, one that looks like eyes, one that looks

29
00:01:46,860 --> 00:01:48,150
like the is.

30
00:01:48,150 --> 00:01:48,550
I believe.

31
00:01:48,570 --> 00:01:53,940
I'm not sure what I drew there, but generally you can see that's what that's how it works.

32
00:01:54,270 --> 00:01:59,340
So it's going to go through this image slide window back and forth here looking for these features,

33
00:01:59,340 --> 00:02:03,420
and it's going to trigger the feature maps that are basically activated.

34
00:02:03,420 --> 00:02:09,810
Windows features are seen and those are basically going to be stored is a lot more to CNN.

35
00:02:09,810 --> 00:02:10,500
So don't worry.

36
00:02:10,980 --> 00:02:16,680
But for now, just think of it that the hidden layers will now take these features and no true like

37
00:02:16,680 --> 00:02:18,270
some imagine like this.

38
00:02:18,270 --> 00:02:19,560
Like a decision tree here.

39
00:02:19,560 --> 00:02:24,150
If this is this and this is here and all of those things, then it a cat.

40
00:02:24,300 --> 00:02:27,660
If not, it's a dog or whatever class it's trained to recognize.

41
00:02:29,860 --> 00:02:31,840
OK, so we're doing this sliding window again.

42
00:02:33,340 --> 00:02:41,970
So in the past, this is something that this is why this is what makes CNN's Sue missing in the past.

43
00:02:41,980 --> 00:02:51,700
Computer vision involves us using hand-crafted features, which were basically painful because it involves

44
00:02:51,700 --> 00:02:58,480
so much trial and error, so much different feature extraction methods that it was it was.

45
00:02:58,750 --> 00:03:02,410
I mean, it worked when when there wasn't much variation in images.

46
00:03:02,920 --> 00:03:09,070
So you can imagine, like it works well for things like OCR and other other types of activities where

47
00:03:09,070 --> 00:03:13,810
there isn't a huge variation in lighting and angles and class defamation.

48
00:03:14,440 --> 00:03:22,060
However, it was, it's hard to scale those types of methods to other types of images of other types

49
00:03:22,060 --> 00:03:23,530
of data sets as well.

50
00:03:23,940 --> 00:03:31,420
So as I said, handcrafting was hard, messy, and in most cases, it often led to poor results.

51
00:03:31,960 --> 00:03:38,860
CNN's solved this problem by having the ability to learn the features themselves, which was a godsend

52
00:03:38,860 --> 00:03:43,690
for us, and it basically created this revolution in computer vision.

53
00:03:44,710 --> 00:03:49,480
So these here orderlies I spoke about, the hidden layers are comprised of CNN.

54
00:03:49,570 --> 00:03:55,870
So in the next chapters, we're going to take a look at really layers pulling layers, fully connected

55
00:03:55,870 --> 00:03:57,570
dense layers of butt.

56
00:03:57,580 --> 00:04:02,920
And we're going to take a look at a lot of other things like Max Boot Lexical as a player, but we're

57
00:04:02,920 --> 00:04:08,710
going to take a look at stride and depth and kernel size and all of this off max and all of those things.

58
00:04:09,760 --> 00:04:10,750
So starting.

59
00:04:10,750 --> 00:04:17,200
But before we dive into the building blocks of CNN's, I just want to explain how convolutions work

60
00:04:17,200 --> 00:04:23,500
on color images because this is something that is really talked about in most online tutorials and guides.

61
00:04:23,950 --> 00:04:30,550
And it often confuses beginners because we just, I just showed you illustrated how convolutions work

62
00:04:30,550 --> 00:04:31,930
on grayscale images.

63
00:04:32,290 --> 00:04:33,820
However, it's not.

64
00:04:33,850 --> 00:04:39,040
It doesn't look exactly the same on color images, and you'll see why in the next section.

65
00:04:39,370 --> 00:04:39,790
Thank you.
