1
00:00:00,080 --> 00:00:02,630
Welcome to the section on error sanctioning.

2
00:00:02,630 --> 00:00:07,490
In the previous section we saw how to build a simple linear regression model.

3
00:00:07,490 --> 00:00:15,680
In this section, we'll see how to calculate the model's error in order to permit it make or update

4
00:00:15,680 --> 00:00:20,900
its parameters such that it makes, uh, such that this error is reduced.

5
00:00:20,900 --> 00:00:22,790
So let's get back here.

6
00:00:22,790 --> 00:00:26,240
We had this y equals mx plus c and then we had this line.

7
00:00:26,240 --> 00:00:34,700
But um, again this line could be just any other line depending on how we initialize um m um and c.

8
00:00:34,700 --> 00:00:36,590
So we could have this line.

9
00:00:36,980 --> 00:00:40,010
It could be this line, it could be this line.

10
00:00:40,010 --> 00:00:44,150
It could be this line, or it could be maybe say even this line.

11
00:00:44,150 --> 00:00:51,020
Now, supposing that it is this line right here, then for each and every point, the way we are going

12
00:00:51,020 --> 00:00:57,890
to sanction, um, the error or the model's error is by, first of all, measuring it.

13
00:00:57,890 --> 00:01:01,700
So we would, um, get let's, let's start with this point.

14
00:01:01,700 --> 00:01:10,400
For example, if we have this point here, uh, what the model predicts is that the price if you use

15
00:01:10,400 --> 00:01:15,560
the model's prediction and then you link this up, this point up with the model's prediction, that

16
00:01:15,560 --> 00:01:16,700
is a straight line.

17
00:01:16,730 --> 00:01:24,650
Then you would find that, uh, model model predicts is actually this value.

18
00:01:24,650 --> 00:01:32,060
So the model predicts uh, the, let's say a value of um, the well this this prices go from 8 to 10.

19
00:01:32,060 --> 00:01:40,220
So let's suppose that this is a value of about five, whereas it's supposed to predict ten, because

20
00:01:40,220 --> 00:01:42,290
here we have the true value.

21
00:01:42,290 --> 00:01:48,920
Let's take this, let's copy this and um just drag it right here.

22
00:01:48,920 --> 00:01:52,550
So as we're saying here we have the true value from this point.

23
00:01:52,550 --> 00:01:54,230
The true value is actually this ten.

24
00:01:54,230 --> 00:01:57,140
But what the model predicts is five.

25
00:01:57,140 --> 00:02:03,650
And so now this difference of five is actually what we would call the error.

26
00:02:03,650 --> 00:02:07,040
So this this difference that's the error.

27
00:02:07,040 --> 00:02:11,960
But it happens that this error is not always this large for all other values.

28
00:02:11,960 --> 00:02:15,320
If you let's zoom in let's zoom in.

29
00:02:15,320 --> 00:02:16,850
You will find that for this value.

30
00:02:16,850 --> 00:02:23,810
For example, for this, um, specific value we have here, what the model predicts and its actual value

31
00:02:23,810 --> 00:02:24,680
is the same.

32
00:02:24,680 --> 00:02:27,410
Let's get back, copy this and paste out.

33
00:02:27,410 --> 00:02:29,120
So we have that here.

34
00:02:29,120 --> 00:02:36,170
So let's say the model here predicts um 5.5 or 5 point um two.

35
00:02:36,170 --> 00:02:39,020
So here we have 5.2.

36
00:02:39,050 --> 00:02:40,820
Well this is five here.

37
00:02:41,360 --> 00:02:42,920
Let's drag this down.

38
00:02:42,920 --> 00:02:45,290
So here we have this five.

39
00:02:46,220 --> 00:02:50,180
And then just right beside it we have 5.2.

40
00:02:50,210 --> 00:02:51,650
So this is 5.2.

41
00:02:51,650 --> 00:02:52,940
And then this is five.

42
00:02:52,970 --> 00:02:57,890
So what we're saying for this point what the the model predicts.

43
00:02:57,890 --> 00:03:03,470
That's when you link the point with the with a line with is our function y equals mx plus c.

44
00:03:03,560 --> 00:03:07,040
You will find that what it predicts is um 5.2.

45
00:03:07,040 --> 00:03:10,430
And the actual value itself is 5.2.

46
00:03:10,460 --> 00:03:17,120
So what this tells us now is that our error our error is equal to zero.

47
00:03:17,630 --> 00:03:18,320
So that's it.

48
00:03:18,320 --> 00:03:20,120
We have an error of zero.

49
00:03:20,390 --> 00:03:28,190
Um, well, just above here at this, uh, at the level of this point, we have an error which is,

50
00:03:28,190 --> 00:03:30,230
um, quite large of about five.

51
00:03:30,260 --> 00:03:34,010
Let's let's see there we have an error of, um, about five.

52
00:03:34,040 --> 00:03:34,610
Okay.

53
00:03:34,610 --> 00:03:36,260
So that's it.

54
00:03:36,260 --> 00:03:43,490
We see that depending on the depending on what the point, we could have different, um, error values.

55
00:03:43,490 --> 00:03:49,370
Now generally what we want to do is we want to get an average of all these values.

56
00:03:49,370 --> 00:03:53,480
And then, um, that will be our overall error.

57
00:03:53,480 --> 00:03:58,520
So our overall error here will be for each and every point we have here we will get the errors.

58
00:03:58,520 --> 00:04:01,670
So let's say uh we will take this point.

59
00:04:01,730 --> 00:04:07,040
You see when you when you link this up with a, with a line, you find that what the model predicts

60
00:04:07,040 --> 00:04:12,290
here is this value, whereas the actual value is right here.

61
00:04:12,650 --> 00:04:15,950
And so the error is going to be this for this point.

62
00:04:15,950 --> 00:04:18,830
And you could repeat the same process for all other points.

63
00:04:18,830 --> 00:04:25,670
But what's important to note is the fact that, um, the error of the model is going to be um, the,

64
00:04:25,670 --> 00:04:30,980
the, the mean or the sum total, um, averaged of all these different points.

65
00:04:30,980 --> 00:04:35,630
Let's suppose that we had, um, this line something like this.

66
00:04:35,630 --> 00:04:37,610
So let's suppose that this line was like this.

67
00:04:37,610 --> 00:04:44,030
In that case the, the error for all these values, you see that the error for all these values when

68
00:04:44,030 --> 00:04:50,600
you compare, well, let's get back, you would find that when you compare this model, this model here

69
00:04:50,780 --> 00:04:58,220
with this other model, the error here will be much larger than the error for a.

70
00:04:58,220 --> 00:05:01,220
So let's call this model A and let's call this model B.

71
00:05:01,430 --> 00:05:02,960
Um our error.

72
00:05:03,560 --> 00:05:04,670
Our error.

73
00:05:05,780 --> 00:05:13,760
For model B will be much larger or let's say greater than the error for model A.

74
00:05:13,760 --> 00:05:20,090
And this makes sense because for most of the points we have here, they are individual errors.

75
00:05:20,090 --> 00:05:26,180
When we're using this method, B is going to be much larger than the individual errors.

76
00:05:26,180 --> 00:05:31,820
When we're using, um model A diving into some code, we get to the documentation.

77
00:05:31,820 --> 00:05:34,220
You have TensorFlow losses.

78
00:05:34,220 --> 00:05:35,210
Let's reduce this.

79
00:05:35,210 --> 00:05:39,410
We have TensorFlow losses Keras.

80
00:05:39,410 --> 00:05:41,450
And then we have losses.

81
00:05:41,450 --> 00:05:45,320
Here we have this mean squared error.

82
00:05:45,320 --> 00:05:47,150
Let's click on this mean squared error.

83
00:05:48,080 --> 00:05:54,560
And it's going to be defined your computer mean of squares of errors between labels and predictions.

84
00:05:54,560 --> 00:05:56,330
So that's it.

85
00:05:56,330 --> 00:05:57,590
Simple definition.

86
00:05:57,590 --> 00:05:59,780
And then you have some examples.

87
00:05:59,780 --> 00:06:03,590
So just right here we have um y true.

88
00:06:03,590 --> 00:06:10,130
And so what that means is if you consider in this example here, if you consider this example it's y

89
00:06:10,160 --> 00:06:11,120
true is ten.

90
00:06:11,120 --> 00:06:14,000
Whereas what the model predicts is five.

91
00:06:14,120 --> 00:06:14,720
You see.

92
00:06:14,720 --> 00:06:20,210
So here is y true it's going to be ten while the y pred by the model is going to be five.

93
00:06:20,210 --> 00:06:28,340
And then with this um mean squared error loss we'll be able to compute the the loss very easily.

94
00:06:28,340 --> 00:06:30,110
So let's copy this.

95
00:06:30,650 --> 00:06:34,910
Get back to our notebook, space it out.

96
00:06:34,910 --> 00:06:38,090
And you see here we have y true zero one.

97
00:06:38,090 --> 00:06:45,200
Let's just let's rearrange this zero one uh and zero zero and one one and one zero.

98
00:06:45,200 --> 00:06:47,750
So let's run this and then see what we get.

99
00:06:47,750 --> 00:06:48,890
Well let's print this out.

100
00:06:48,890 --> 00:06:53,600
So let's have print print that out and see what we get.

101
00:06:53,750 --> 00:06:55,760
So you see we have 0.5.

102
00:06:55,760 --> 00:06:57,560
And the way we could get this is simple.

103
00:06:57,560 --> 00:07:01,970
We will have for each and every position you compare the values zero and one.

104
00:07:01,970 --> 00:07:04,190
That gives you zero minus one.

105
00:07:04,190 --> 00:07:08,240
And the mean square error as the name goes is squared.

106
00:07:08,240 --> 00:07:13,250
So we have zero minus one all of that squared.

107
00:07:13,250 --> 00:07:14,960
So we have that squared.

108
00:07:15,290 --> 00:07:17,030
Um plus it's a mean.

109
00:07:17,030 --> 00:07:17,990
So it's an average.

110
00:07:17,990 --> 00:07:23,000
So we have plus um one here and one.

111
00:07:23,000 --> 00:07:27,170
So let's just copy this and paste out here.

112
00:07:27,200 --> 00:07:28,040
Take that off.

113
00:07:28,040 --> 00:07:33,110
We have one and 1 or 1 minus one all that squared.

114
00:07:33,290 --> 00:07:39,890
Plus um for the next zero minus one that is zero minus one all that squared.

115
00:07:39,890 --> 00:07:42,650
And then here we have zero -0 or that squared.

116
00:07:42,650 --> 00:07:44,840
So zero -0 other square.

117
00:07:45,080 --> 00:07:46,700
Take that off and we have zero.

118
00:07:46,700 --> 00:07:52,970
So all this is going to give us um one this is one square one plus zero plus one plus zero.

119
00:07:52,970 --> 00:07:53,660
That's two.

120
00:07:53,660 --> 00:07:58,820
And if you divide this by the total number of um elements we have four.

121
00:07:58,820 --> 00:08:01,850
So this gives us 0.5.

122
00:08:01,850 --> 00:08:04,940
Now let's modify this slightly and say we will have one.

123
00:08:04,940 --> 00:08:10,160
So if we have one then we will have one divided by four which will give us 0.25.

124
00:08:10,760 --> 00:08:13,580
Here one divided by four should give us 0.25.

125
00:08:13,580 --> 00:08:16,760
Let's change this from 0 to 1.

126
00:08:16,760 --> 00:08:18,620
Now we have one minus.

127
00:08:18,650 --> 00:08:20,030
We have zero minus one.

128
00:08:20,030 --> 00:08:22,190
So here is zero minus one.

129
00:08:22,190 --> 00:08:24,770
And then um this should be one.

130
00:08:24,770 --> 00:08:28,460
This is one plus zero plus one one plus one.

131
00:08:28,460 --> 00:08:30,080
So that's three divided by four.

132
00:08:30,080 --> 00:08:32,210
Now we should have 0.75.

133
00:08:32,210 --> 00:08:34,550
So let's comment that and see what we get.

134
00:08:35,430 --> 00:08:36,240
There we go.

135
00:08:36,240 --> 00:08:37,890
You see we have 0.75.

136
00:08:37,890 --> 00:08:45,960
So essentially we have this error which is for each and every point here what the model predicts, which

137
00:08:45,960 --> 00:08:48,570
in this case is um, five actually.

138
00:08:48,570 --> 00:08:53,400
Because when you, when you make use of the line you see is five what the model predicts that, uh,

139
00:08:53,400 --> 00:09:01,590
five minus what the model predicts, minus what um, is actually like minus the actual value.

140
00:09:01,590 --> 00:09:03,750
So that's why a y actual.

141
00:09:03,750 --> 00:09:08,730
And you take all this and you square, then what you do is you repeat this for all the different points,

142
00:09:08,730 --> 00:09:09,870
and then you look for the average.

143
00:09:09,870 --> 00:09:17,730
So simply, um, the sum of all this y minus y is um, all of that square.

144
00:09:17,730 --> 00:09:23,940
So you sum all those up, and then you divide by the total number of points or total number of samples

145
00:09:23,940 --> 00:09:26,580
in, in this case is, um, say thousand.

146
00:09:26,580 --> 00:09:32,220
Now, the next problem we have with this mean square error is that it doesn't do well when it comes

147
00:09:32,220 --> 00:09:34,530
to, um, dealing with outliers.

148
00:09:34,560 --> 00:09:38,580
A simple outlier we could detect, um, by looking at this plot.

149
00:09:38,580 --> 00:09:40,170
Is this one right here?

150
00:09:40,260 --> 00:09:41,790
Why is this an outlier?

151
00:09:41,790 --> 00:09:42,870
The answer is simple.

152
00:09:42,870 --> 00:09:47,940
We have high horsepower, but we have low price.

153
00:09:47,940 --> 00:09:51,090
We would expect that the the higher the horsepower.

154
00:09:51,090 --> 00:09:56,940
Like most others here, we should have, um, a higher price, just like, um, as we've seen with all

155
00:09:56,940 --> 00:09:57,660
these others.

156
00:09:57,660 --> 00:10:06,750
So with this outliers, given that the deviate a lot from the usual pattern of the whole, um, data

157
00:10:06,750 --> 00:10:16,830
set of, uh, different samples using y minus y, that's y minus y, let's say y prime square using

158
00:10:16,830 --> 00:10:23,520
all this, this kind of formula where we have, uh, y minus y prime square, um, isn't the best,

159
00:10:23,520 --> 00:10:23,850
uh.

160
00:10:24,780 --> 00:10:25,410
Here.

161
00:10:25,410 --> 00:10:30,480
The different the the the error is much larger as compared to all the others.

162
00:10:30,480 --> 00:10:32,430
So here we would have an error.

163
00:10:32,790 --> 00:10:39,480
Um, if we take this and paste it out here, let's try to link this up with our with our plot.

164
00:10:39,480 --> 00:10:42,570
So let's, let's, let's keep dragging this.

165
00:10:42,570 --> 00:10:46,500
See we link this up and then we drag this to up.

166
00:10:46,500 --> 00:10:47,310
There we go.

167
00:10:47,310 --> 00:10:51,450
You see that this error is much larger as compared to all the other errors.

168
00:10:51,450 --> 00:10:57,150
So um, compared to this is much larger than this, much larger than maybe the error would have here,

169
00:10:57,150 --> 00:11:00,630
much larger than this error and much larger than this error.

170
00:11:00,630 --> 00:11:10,350
And so because of that, um, the model will pay more attention to this outlier as compared to all these

171
00:11:10,350 --> 00:11:12,420
other, um, points.

172
00:11:12,420 --> 00:11:19,260
And this is for the simple reason that the loss is going to be much higher, especially as we're going

173
00:11:19,260 --> 00:11:19,890
to be squared.

174
00:11:19,890 --> 00:11:27,000
So remember, if you have, um, a square, um, you would have something like this plot like this.

175
00:11:27,000 --> 00:11:32,370
So if you have um say an error of three.

176
00:11:33,530 --> 00:11:34,490
An era of three.

177
00:11:34,520 --> 00:11:41,180
Well, you would have a square of nine, but if you have an error of ten, an error of ten, you have

178
00:11:41,180 --> 00:11:43,520
a square of about 100.

179
00:11:43,520 --> 00:11:48,920
So the, the, the gap you have here will be much larger than as compared to the gap you have here.

180
00:11:48,920 --> 00:11:57,470
So this kind of outliers tend to trick the model into updating its parameters M and C based on um,

181
00:11:57,470 --> 00:12:01,670
its own particular loss when um, it's actually an outlier.

182
00:12:01,670 --> 00:12:07,970
So what we use in this kind of cases is we make use of the mean absolute error.

183
00:12:07,970 --> 00:12:14,780
So instead of squaring we just simply say y minus y prime and then we take the absolute value.

184
00:12:14,780 --> 00:12:19,880
So instead of this kind of plot we would have a plot like um this instead.

185
00:12:19,880 --> 00:12:23,960
And the difference now is that although the we still have that error margin.

186
00:12:23,960 --> 00:12:31,670
So although the error or the difference between the y predicted and the Y um, actual is going to still

187
00:12:31,670 --> 00:12:38,030
be large, we we don't have to square it so it doesn't, um, aggravate the situation.

188
00:12:38,030 --> 00:12:39,200
So that's it.

189
00:12:39,200 --> 00:12:42,200
We have this mean absolute error again with TensorFlow.

190
00:12:42,200 --> 00:12:46,190
You could, um, use that without writing any extra line of code.

191
00:12:46,190 --> 00:12:51,710
Now, if you want the best of both worlds, that's if you want to use the mean squared error and the

192
00:12:51,740 --> 00:12:53,480
mean absolute error.

193
00:12:53,510 --> 00:12:56,510
Then you could go in for the Uber loss.

194
00:12:56,510 --> 00:13:03,050
Now with the Uber loss spurred Uber loss with the Uber loss is actually a mixture of the two.

195
00:13:03,080 --> 00:13:12,560
So what we have here is if we have um if the error is too large, then we'll use the mean absolute error.

196
00:13:12,560 --> 00:13:21,500
So we just we just compare if we have um, error, if our, our error, um, error is too large.

197
00:13:21,500 --> 00:13:26,900
So if error is large then we use our mean absolute error.

198
00:13:26,900 --> 00:13:35,270
And then if, if we have an error that is two that is small or that is usual like um, in the case of

199
00:13:35,270 --> 00:13:41,330
all these other points, if you have usual errors then our usual error margins, then you would go in

200
00:13:41,330 --> 00:13:42,890
for the mean square error.

201
00:13:42,890 --> 00:13:53,450
So in that case if your error, if your error, um, error is small, then you will go in for the mean

202
00:13:53,450 --> 00:13:54,380
square error.

203
00:13:54,590 --> 00:13:59,150
Diving back to the documentation, you have the mean absolute error just right here.

204
00:13:59,630 --> 00:14:06,410
This is the mean squared error square of y minus y prime square or rather the square y minus y prime.

205
00:14:06,410 --> 00:14:10,730
And then for the mean absolute error computes the mean of absolute difference between the labels and

206
00:14:10,730 --> 00:14:11,540
the predictions.

207
00:14:11,540 --> 00:14:18,440
Now um, remember that the absolute value, the absolute value actually is um x.

208
00:14:18,530 --> 00:14:20,630
That's let's, let's, let's write that out.

209
00:14:20,630 --> 00:14:29,000
In case you don't know, when you talk about absolute function, the absolute value is simply x if x

210
00:14:29,000 --> 00:14:30,260
is greater than zero.

211
00:14:30,260 --> 00:14:33,320
So if x is already positive then it remains the same.

212
00:14:33,320 --> 00:14:38,840
But if x is negative, then you add a negative there so that it turns into a positive value.

213
00:14:38,840 --> 00:14:46,160
So in that case, the absolute value of minus seven is seven and the absolute value of seven is seven.

214
00:14:46,610 --> 00:14:47,450
So that's it.

215
00:14:47,450 --> 00:14:48,380
We get back here.

216
00:14:48,380 --> 00:14:50,480
That's that's it for the mean absolute error.

217
00:14:50,480 --> 00:14:51,620
So let's get here.

218
00:14:51,620 --> 00:14:52,760
We have Uber.

219
00:14:53,450 --> 00:14:54,560
There we go.

220
00:14:54,590 --> 00:14:55,910
We have this Uber loss.

221
00:14:55,940 --> 00:14:57,200
Let's scroll up.

222
00:14:57,440 --> 00:15:00,800
Compute the Uber loss between y true and y pred.

223
00:15:00,800 --> 00:15:03,140
So we have this parameter delta.

224
00:15:03,140 --> 00:15:05,870
And the role of this parameter delta is simple.

225
00:15:06,140 --> 00:15:11,030
Um the way we know that the error is large is by comparing it with our parameter delta.

226
00:15:11,030 --> 00:15:16,130
So if the error is greater than the parameter delta, then we consider that it's large and we use the

227
00:15:16,130 --> 00:15:17,180
mean absolute error.

228
00:15:17,180 --> 00:15:22,970
If the error is less than the parameter delta, we consider that the error is small and we use the mean

229
00:15:22,970 --> 00:15:23,720
squared error.

230
00:15:23,720 --> 00:15:25,910
So getting back here we have that.

231
00:15:25,910 --> 00:15:29,150
We should have an example showing that okay this is it.

232
00:15:29,150 --> 00:15:31,940
You see we have um x squared.

233
00:15:31,970 --> 00:15:34,070
That's y minus y prime square.

234
00:15:34,070 --> 00:15:36,500
If x is less than that parameter delta.

235
00:15:36,500 --> 00:15:37,970
Yeah denoted as d.

236
00:15:37,970 --> 00:15:43,400
And um if it's not then we're going to use um the absolute value.

237
00:15:43,400 --> 00:15:49,430
Well looking at this here, this definition, it turns out that we made an error, uh, on the board.

238
00:15:49,700 --> 00:15:52,520
Um, and it's actually not just absolute value.

239
00:15:52,520 --> 00:15:57,650
So we have uh, with the Uber, if it's less than the delta, that is, if the error is small, then

240
00:15:57,650 --> 00:16:00,920
we use the mean squared error y minus y prime um square.

241
00:16:00,920 --> 00:16:03,440
But for the Uber it's still the absolute value.

242
00:16:03,440 --> 00:16:08,420
But now we subtract half of the sigma and then multiply by sigma.

243
00:16:08,420 --> 00:16:10,040
So or delta.

244
00:16:10,040 --> 00:16:14,450
So this delta um comes in again in the formula.

245
00:16:14,450 --> 00:16:16,700
So it's not just y minus y prime.

246
00:16:16,700 --> 00:16:19,580
And then one uh times one over n.

247
00:16:19,580 --> 00:16:24,710
Anyways the good thing with TensorFlow is you don't really need to dive into all these details before

248
00:16:24,710 --> 00:16:27,680
making use of, um, the Uber loss in practice.

249
00:16:27,680 --> 00:16:30,500
So here you see how to make use of it.

250
00:16:30,500 --> 00:16:32,840
Um, usage with the compile API.

251
00:16:33,060 --> 00:16:36,390
So after you've created your model, you now compile that model.

252
00:16:36,390 --> 00:16:41,670
You specify the loss and you see you specify the optimizer, which we shall look at in the next section.

253
00:16:41,670 --> 00:16:45,300
So let's just copy this and then um get back to the code.

254
00:16:45,870 --> 00:16:46,620
Take this off.

255
00:16:46,620 --> 00:16:48,300
We have now model compile.

256
00:16:48,300 --> 00:16:52,950
We have the optimizer SGD and the loss which is the over loss.
