1
00:00:11,060 --> 00:00:16,220
OK, so in this lecture, we will be looking at a new notebook that will explore a few different details

2
00:00:16,220 --> 00:00:17,720
that we didn't previously cover.

3
00:00:18,620 --> 00:00:24,410
This notebook will look at multiplicative seasonality, outliers and non daily time intervals.

4
00:00:24,980 --> 00:00:29,650
As you recall, the Rassman data set contained daily data with additive seasonality.

5
00:00:29,900 --> 00:00:32,900
So he wouldn't have been able to explore these other features.

6
00:00:34,340 --> 00:00:36,830
So it will begin again by installing profit.

7
00:00:48,230 --> 00:00:50,390
The next step is to download our data set.

8
00:00:57,530 --> 00:01:02,450
The next step is to import profit, as well as other useful libraries from the numpties stock.

9
00:01:07,860 --> 00:01:11,660
The next step is to load in our data using PDF that reads CSFI.

10
00:01:15,240 --> 00:01:19,860
The next step is to call the head, just in case you forgot what this data looks like.

11
00:01:25,420 --> 00:01:30,740
So as you can see, we have one column for passengers and the index is also the time stamp.

12
00:01:31,300 --> 00:01:34,930
Notice that the sampling rate of our time series is once per month.

13
00:01:38,020 --> 00:01:42,730
The next step is to rename our passengers column to Why as Profit Expects.

14
00:01:46,780 --> 00:01:52,360
The next step is to assign the index to a column called D again as profit expects.

15
00:01:56,890 --> 00:02:01,930
The next step is to call the head again to ensure that our data frame is in the right format.

16
00:02:07,820 --> 00:02:11,920
The next step is to call the detail to check on our time series ends.

17
00:02:16,610 --> 00:02:20,480
OK, so you can see that our time series ends at the end of nineteen sixty.

18
00:02:23,750 --> 00:02:26,360
The next step is to instantiate our profit model.

19
00:02:31,000 --> 00:02:32,740
The next step is to call it.

20
00:02:36,810 --> 00:02:42,990
Notice that now we have even more messages coming from the fit function, this time we see that both

21
00:02:42,990 --> 00:02:45,870
daily and weekly seasonality are turned off.

22
00:02:46,410 --> 00:02:52,230
This makes sense since our data is monthly and therefore it wouldn't be possible to infer seasonality

23
00:02:52,230 --> 00:02:53,340
at a smaller scale.

24
00:02:56,020 --> 00:02:58,520
The next step is to call make future data frame.

25
00:02:59,410 --> 00:03:04,790
Note that we now use an extra argument called Freekeh to specify that the periods are in months.

26
00:03:05,350 --> 00:03:08,380
So in this case, we'll be making a one year forecast.

27
00:03:12,650 --> 00:03:17,480
The next step is to call future detail, to confirm that it contains the months we expect.

28
00:03:20,890 --> 00:03:26,710
So as you can see, the future data frame contains dates up to nineteen sixty one, which is one year

29
00:03:26,710 --> 00:03:28,660
after the end of our Time series.

30
00:03:31,120 --> 00:03:34,470
The next step is to call predict to obtain our predictions.

31
00:03:38,880 --> 00:03:42,360
The next step is to call the plot function to plot our predictions.

32
00:03:42,930 --> 00:03:46,130
We'll also use this opportunity to plot the change points as well.

33
00:03:51,760 --> 00:03:54,550
OK, so this looks reasonable, but it's not perfect.

34
00:03:55,090 --> 00:04:01,540
Notice how our model overestimates at the beginning and underestimates at the end, it's unable to model

35
00:04:01,540 --> 00:04:05,660
the fact that the seasonal component increases in magnitude over time.

36
00:04:06,460 --> 00:04:10,000
This is because by default, we have additive seasonality.

37
00:04:14,370 --> 00:04:16,860
The next step is to plot the components of our model.

38
00:04:20,910 --> 00:04:26,100
So here we can see that the trend increases somewhere in the middle and we seem to have an increase

39
00:04:26,100 --> 00:04:28,490
in the summer months, which makes sense.

40
00:04:32,120 --> 00:04:36,060
The next step is to create a second model with multiplicative seasonality.

41
00:04:36,590 --> 00:04:38,780
We'll see whether or not this does a better job.

42
00:04:43,330 --> 00:04:44,920
The next step is to call it.

43
00:04:49,470 --> 00:04:53,730
The next step is to call make future data frame with the same arguments as before.

44
00:04:57,730 --> 00:05:00,700
The next step is to predict, as we did before.

45
00:05:06,240 --> 00:05:09,720
The next step is to plot the forecast along with the change point's.

46
00:05:16,660 --> 00:05:22,390
OK, so as you can see, the model now does a much better job of matching the peaks and troughs, both

47
00:05:22,390 --> 00:05:25,960
when the seasonal component is small and when it is large.

48
00:05:26,650 --> 00:05:31,360
Note that the change points are not really that sensible because we're still using a linear model of

49
00:05:31,360 --> 00:05:32,010
growth.

50
00:05:33,160 --> 00:05:37,220
So as an exercise, you may want to try experimenting with logistic growth.

51
00:05:37,660 --> 00:05:42,340
Note that I've tried this myself, but the result didn't look particularly good, at least with the

52
00:05:42,340 --> 00:05:43,570
settings I chose.

53
00:05:47,870 --> 00:05:50,150
The next step is to plot the components.

54
00:05:56,130 --> 00:06:00,090
As you can see, we again get a peak in the summer, which is to be expected.

55
00:06:03,760 --> 00:06:09,490
Now, as you recall, one way we can avoid having to use multiplicative seasonality is to simply take

56
00:06:09,490 --> 00:06:10,570
the log of the data.

57
00:06:11,440 --> 00:06:16,360
So in this next step, we're going to take the log of the Y column and make a new data frame called

58
00:06:16,360 --> 00:06:17,170
Log The.

59
00:06:21,400 --> 00:06:24,130
The next step is to create our third model and three.

60
00:06:28,150 --> 00:06:30,250
The next step is to call fit once again.

61
00:06:35,390 --> 00:06:38,720
The next step is to call make future data frame once again.

62
00:06:42,720 --> 00:06:45,210
The next step is to call predict once again.

63
00:06:50,540 --> 00:06:54,590
The next step is to plot the forecast along with the checkpoints once again.

64
00:07:00,650 --> 00:07:06,950
OK, so as you can see, this model is probably a bit too sensitive with this change points, however,

65
00:07:06,950 --> 00:07:11,360
note that it has no issue with the seasonal component thanks to the log transform.

66
00:07:15,390 --> 00:07:18,030
The next step is to plot the seasonal components.

67
00:07:23,780 --> 00:07:27,800
As you can see, we still get a peak in the summer, which is what we would expect.

68
00:07:31,610 --> 00:07:36,590
OK, so in the next part of this notebook, we are going to investigate what happens when you have our

69
00:07:36,590 --> 00:07:37,280
lives.

70
00:07:39,230 --> 00:07:41,670
We'll begin by making some fake outliers.

71
00:07:42,050 --> 00:07:48,560
So on January one, 1955, I've set the value to 600, which is much larger than normal.

72
00:07:49,320 --> 00:07:54,740
On June one nineteen fifty seven, I've set the value to one which is much smaller than normal.

73
00:07:58,540 --> 00:08:02,050
The next step is to plot our Time series with the new outliers.

74
00:08:06,310 --> 00:08:11,950
So as you can see, there is one part where the Time series is uncharacteristically high and another

75
00:08:11,950 --> 00:08:13,990
where it is uncharacteristically low.

76
00:08:17,430 --> 00:08:22,890
The next step is to create a model with multiplicative seasonality, we'll do the fitting and predicting

77
00:08:22,890 --> 00:08:26,070
as well, since you're already familiar with these steps.

78
00:08:34,070 --> 00:08:39,050
OK, so notice that our model still appears to fit quite well, but there is one big difference.

79
00:08:39,590 --> 00:08:44,330
This is that the prediction interval is very large compared to what it was before.

80
00:08:44,900 --> 00:08:46,520
This is due to the outliers.

81
00:08:46,880 --> 00:08:52,670
And this makes sense since if the model has seen very large or very small values, it may expect to

82
00:08:52,670 --> 00:08:57,160
see them again in the future, making it less confident in its own predictions.

83
00:09:02,070 --> 00:09:04,350
OK, so how do we deal with outliers?

84
00:09:04,920 --> 00:09:08,350
Well, recall that profit is essentially a continuous time model.

85
00:09:08,850 --> 00:09:13,110
Time is the only aggressor, so missing data is essentially a non-issue.

86
00:09:14,180 --> 00:09:19,230
Thus, the recommended method of dealing with outliers is to simply remove them.

87
00:09:20,100 --> 00:09:25,500
We're going to do this by simply creating a new data frame called data that does not include the two

88
00:09:25,500 --> 00:09:27,810
outlier dates we previously chose.

89
00:09:32,160 --> 00:09:35,940
The next step is to plot our new time series with the outliers removed.

90
00:09:40,730 --> 00:09:43,730
As you can see, it's essentially no different from before.

91
00:09:47,420 --> 00:09:52,480
The next step is to build a new model and fit it on our Time series with our lives removed.

92
00:10:00,700 --> 00:10:06,820
So as you can see, we now get a much better fit in addition and notice how the prediction intervals

93
00:10:06,820 --> 00:10:11,410
are, again, very small since there is no need to account for large deviations.