1
00:00:11,090 --> 00:00:15,660
So at this point, you now understand how convolution is applied to images?

2
00:00:16,100 --> 00:00:18,260
Well, we could use this approach for Time series.

3
00:00:18,500 --> 00:00:23,780
The next step is to look more closely at one's convolution, which handles Time series directly.

4
00:00:24,590 --> 00:00:29,400
So this lecture will look at one de convolution specifically for use on Time series.

5
00:00:29,960 --> 00:00:35,240
Now we discuss the mathematical operation in the previous lectures, so we don't really need to discuss

6
00:00:35,240 --> 00:00:35,870
that again.

7
00:00:36,320 --> 00:00:40,310
But for completion sake, let's just review what Convolution is doing.

8
00:00:42,160 --> 00:00:48,640
So suppose we have some multivariate input time, series of shape, TBD, if you had multiple samples,

9
00:00:48,640 --> 00:00:52,540
this would be invited, but let's ignore that extra dimension for now.

10
00:00:53,770 --> 00:00:58,870
Now, let's also suppose that we've chosen to have more output features and we are doing convolution

11
00:00:58,870 --> 00:01:03,880
and same mode, thus our output time series, whatever the shape T by M.

12
00:01:04,720 --> 00:01:11,230
So if we choose a kind of length of K than our kernel or filter, whatever the shape K by M and the

13
00:01:11,230 --> 00:01:17,110
actual convolution would be done using the same multiplying sum that you've seen many times.

14
00:01:21,620 --> 00:01:27,020
Now, the point of this lecture is not to do more math, but instead to more deeply understand the math

15
00:01:27,030 --> 00:01:28,000
we've already seen.

16
00:01:28,760 --> 00:01:33,590
So one of the first examples I use to explain convolution is the Gaussian blur.

17
00:01:34,040 --> 00:01:38,140
I think it's pretty obvious what kind of effect this has when we use it on images.

18
00:01:38,600 --> 00:01:41,460
But clearly this can also be applied to Time series.

19
00:01:41,960 --> 00:01:47,240
So if we can evolve a Gaussian with a time series, what we get is a blurred version of that Time series.

20
00:01:47,690 --> 00:01:51,980
Of course, when we talk about Time series, we don't call it blurring, but we call it smoothing.

21
00:01:52,670 --> 00:01:56,830
So this is another way to implement something you've seen many times before.

22
00:01:57,500 --> 00:02:02,600
As you recall, the simple moving average and exponential smoothing work in a similar way.

23
00:02:07,280 --> 00:02:12,770
So you've seen that in learning what we call convolution is really just what everyone else calls cross

24
00:02:12,770 --> 00:02:13,500
correlation.

25
00:02:14,000 --> 00:02:19,410
So if you ever talk to a statistician or an engineer, they will think you are doing convolution backwards.

26
00:02:19,880 --> 00:02:24,980
That's why another way to think of convolutional neural networks is that they are cross correlation

27
00:02:25,070 --> 00:02:26,120
neural networks.

28
00:02:27,880 --> 00:02:34,120
Well, let's recall that this is not the first time that we've seen a correlation in this course, auto

29
00:02:34,120 --> 00:02:36,360
correlation plays a huge role in Arima.

30
00:02:36,760 --> 00:02:40,520
We use the auto correlation to help us determine the Arima orders.

31
00:02:41,020 --> 00:02:46,450
So at this point, we can kind of combine our knowledge of deep learning in Arima to more deeply understand

32
00:02:46,450 --> 00:02:48,010
what correlation is doing.

33
00:02:48,700 --> 00:02:53,710
As mentioned earlier, a convolutional filter in deep learning is like a pattern finder.

34
00:02:54,280 --> 00:02:59,680
By doing convolution with a filter, you will get a spike whenever the filter matches the pattern in

35
00:02:59,680 --> 00:03:02,590
the signal, whether that's an image or a time series.

36
00:03:03,190 --> 00:03:05,330
So what is autocorrelation doing?

37
00:03:05,950 --> 00:03:12,160
Well, it's going to give us a spike whenever the time series matches itself, specifically a lag version

38
00:03:12,160 --> 00:03:12,850
of itself.

39
00:03:13,330 --> 00:03:18,580
In other words, it's pattern matching different parts of the time series with each other and that can

40
00:03:18,580 --> 00:03:22,740
tell you which parts data points are useful for making future predictions.

41
00:03:23,020 --> 00:03:26,200
And that's because if they match, then they are predictive.

42
00:03:27,850 --> 00:03:34,270
Another thing you can do is cross correlation with different time series, if you see a spike at a lag

43
00:03:34,270 --> 00:03:38,590
other than zero, then that means one of these times series is predictive of the other.

44
00:03:43,210 --> 00:03:49,510
Now, in my view, the most interesting way to relate convolution to Arima is this let's write down

45
00:03:49,510 --> 00:03:51,580
the equation for an ERP process.

46
00:03:52,570 --> 00:03:55,000
At this point, I shouldn't have to say anything else.

47
00:03:55,360 --> 00:03:59,730
After watching the previous lectures, you should immediately recognize what this is.

48
00:04:00,190 --> 00:04:02,800
In fact, this is nothing but convolution.

49
00:04:03,820 --> 00:04:08,080
And actually this is real convolution, since we have minus K and not plus K.

50
00:04:08,800 --> 00:04:12,520
That is to say, an AP model is like a mini CNN.

51
00:04:13,330 --> 00:04:16,870
It's a CNN with only one layer and identity activation.

52
00:04:17,380 --> 00:04:20,050
In other words, it's linear nonetheless.

53
00:04:20,050 --> 00:04:21,460
It still is a CNN.

54
00:04:22,300 --> 00:04:25,240
So that to me is a very interesting connection to make.

55
00:04:25,540 --> 00:04:32,500
You can think of auto regressive cnes like an extension of IRP or you can think of IRP as a special

56
00:04:32,500 --> 00:04:33,730
case of CNN's.

57
00:04:35,420 --> 00:04:40,280
And again, note that there is so much theory behind Arima that we just don't have time for in this

58
00:04:40,280 --> 00:04:43,310
course and that most students wouldn't be too keen on learning.

59
00:04:43,910 --> 00:04:49,400
But to give you some idea, once you realize that this is just convolution, there is a lot of analysis

60
00:04:49,400 --> 00:04:54,800
that you can do, especially involving the frequency domain and for what it transforms.
