1
00:00:11,630 --> 00:00:16,910
In this lecture, we are going to continue our discussion of convolution by looking at a totally different

2
00:00:16,910 --> 00:00:18,530
perspective on how it works.

3
00:00:19,160 --> 00:00:24,050
This is very helpful for understanding convolution, but it doesn't teach you anything new mechanically.

4
00:00:24,710 --> 00:00:26,180
So this lecture is optional.

5
00:00:26,480 --> 00:00:29,780
If you want to move on and just learn about how to make CNN's.

6
00:00:34,930 --> 00:00:37,720
Something I mentioned very often is factorization.

7
00:00:38,410 --> 00:00:44,290
We like to victories operations and code because using Nampai functions is a lot more efficient than

8
00:00:44,290 --> 00:00:46,150
writing your own python for loops.

9
00:00:46,900 --> 00:00:52,930
The pattern you're looking for when you want to victories in operation is usually of the form of a product.

10
00:00:53,680 --> 00:00:58,840
A dot product is an element y's multiplication and then the summation of those results.

11
00:00:59,620 --> 00:01:04,209
So whenever you see the sum over I of items by, that's a dot product.

12
00:01:05,790 --> 00:01:11,580
This also applies to matrix multiplication, since if amber matrices, then the matrix multiplication

13
00:01:11,580 --> 00:01:13,730
of AB is just B.

14
00:01:15,660 --> 00:01:18,930
You'll notice that for convolution, we yet again have something similar.

15
00:01:19,530 --> 00:01:22,020
The only difference is that there are two summations.

16
00:01:22,800 --> 00:01:28,260
However, this is not really a relevant detail, since the outcome is the same instead of summing over

17
00:01:28,260 --> 00:01:30,480
one axis or something over two axes.

18
00:01:30,930 --> 00:01:33,000
But it's still an element y some an ad.

19
00:01:38,070 --> 00:01:41,070
The question is, why is the DOT products important?

20
00:01:42,000 --> 00:01:47,520
One definition of the DOT product other than that element, why some an ad is that it's the magnitude

21
00:01:47,520 --> 00:01:54,060
of a multiplied by the magnitude of B multiplied by the cosine of the angle between A and B.

22
00:01:55,110 --> 00:02:00,780
We sometimes call this the cosine similarity or cosine distance, depending on the sign that you use.

23
00:02:05,930 --> 00:02:12,380
So how does this work geometrically consider just the angle for a moment, if the angle between the

24
00:02:12,380 --> 00:02:17,990
two vectors is zero, then the cosine of that angle is one that's the maximum value of the cosine.

25
00:02:23,120 --> 00:02:29,390
Now, imagine that the angle between two vectors is 90 degrees, then the cosine of that angle is zero.

26
00:02:34,490 --> 00:02:38,690
Finally, imagine that the angle between the two vectors is 180 degrees.

27
00:02:39,170 --> 00:02:45,260
This is basically as far apart as possible, then the cosine of that angle is minus one, which is the

28
00:02:45,260 --> 00:02:46,790
minimum value of the cosine.

29
00:02:51,850 --> 00:02:54,940
So if you're just using raw cosine, then it's a similarity.

30
00:02:55,450 --> 00:03:00,550
The larger the number, the closer the two vectors are, the smaller the number, the further away they

31
00:03:00,550 --> 00:03:02,640
are, the maximum value.

32
00:03:02,650 --> 00:03:09,280
When amber parallel is one, the minimum value when and B are A. Parallel is minus one.

33
00:03:09,820 --> 00:03:14,140
And if A and B are orthogonal, then the cosine similarity is just zero.

34
00:03:19,220 --> 00:03:20,600
OK, so why is that important?

35
00:03:21,500 --> 00:03:25,700
Consider now how you would find it the cosine of the angle between two vectors.

36
00:03:26,390 --> 00:03:30,920
That's just 8b divided by the magnitude of a in the magnitude of B.

37
00:03:31,430 --> 00:03:33,740
So I just rearrange the equation that we had before.

38
00:03:38,870 --> 00:03:44,870
Now, let's compare this to another popular measurement, the Pearson correlation, the Pearson correlation

39
00:03:44,870 --> 00:03:46,340
is defined as what you see here.

40
00:03:47,180 --> 00:03:49,970
But notice how similar this is to cosine similarity.

41
00:03:50,600 --> 00:03:54,620
The only difference is that the Pearson correlation uses mean subtraction.

42
00:03:55,770 --> 00:03:59,160
And so now you have two hints that convolution is really correlation.

43
00:03:59,580 --> 00:04:04,470
The first one from before it was that we were actually doing what is called the cross correlation.

44
00:04:05,100 --> 00:04:11,010
And second, now you have that the DOT product is actually very similar to the Pearson correlation.

45
00:04:16,170 --> 00:04:21,089
So I hope that by this point, you are convinced that the dog products, while that seems like an abstract

46
00:04:21,089 --> 00:04:27,240
concept, can be thought of as a correlation measure, it tells me how correlated is the first thing

47
00:04:27,240 --> 00:04:28,050
with the second thing.

48
00:04:28,710 --> 00:04:33,270
If they are highly, positively correlated, then the dog products should be large and positive.

49
00:04:33,750 --> 00:04:36,810
That means two vectors pointing in nearly the same direction.

50
00:04:38,520 --> 00:04:43,020
If they are highly negatively correlated, then the dot product should be large and negative.

51
00:04:43,470 --> 00:04:46,560
That means the two vectors are pointing in nearly opposite direction.

52
00:04:47,730 --> 00:04:53,040
Finally, if the two vectors are orthogonal or at right angles, then the DOT product should be zero.

53
00:04:58,080 --> 00:05:03,150
The reason why this is important is you don't have to think of a filter as an abstract concepts.

54
00:05:03,630 --> 00:05:05,430
In fact, it's just a pattern finder.

55
00:05:06,090 --> 00:05:08,970
This actually makes the term filter make a lot more sense.

56
00:05:09,420 --> 00:05:15,060
It filters out everything not related to the pattern contained in the filter by sending them to zero

57
00:05:15,450 --> 00:05:17,730
and keeps everything that is related to the pattern.

58
00:05:18,750 --> 00:05:24,990
So what convolution is doing is it's passing this filter along each point on the original input image

59
00:05:24,990 --> 00:05:27,900
and sliding it along at each point.

60
00:05:27,990 --> 00:05:32,640
It asks, Is the pattern here is the pattern, here is the pattern here and so forth.

61
00:05:33,270 --> 00:05:37,290
Then it gives us a high number in the positions where the pattern is found.

62
00:05:37,500 --> 00:05:40,230
And then a small number where the pattern is not found.

63
00:05:40,860 --> 00:05:44,880
And thus, this is your first alternative perspective on convolution.

64
00:05:45,420 --> 00:05:51,690
It's just a sliding pattern finder that passes through an entire image looking for a particular pattern.