1
00:00:00,080 --> 00:00:03,950
Hello everyone, and welcome to this session on error Sanctioning.

2
00:00:04,040 --> 00:00:11,900
The error sanctioning method or the loss function we shall be using in this section is going to be the

3
00:00:11,900 --> 00:00:14,510
binary cross entropy loss.

4
00:00:14,510 --> 00:00:19,850
And right here we have this binary cross entropy loss formula.

5
00:00:20,330 --> 00:00:32,120
So um here we have negative y log y or y chapel uh minus one minus y chapel log of one minus y chapel.

6
00:00:32,150 --> 00:00:39,770
To understand this formula, we'll start by um taking this um log plot into consideration.

7
00:00:39,770 --> 00:00:43,970
So our log function is this plot we have right here.

8
00:00:43,970 --> 00:00:50,840
Or it could be, um, explained using this plot right here where the, you see, as we approach zero,

9
00:00:50,840 --> 00:00:55,820
as we're going towards zero, the log approaches negative infinity in the section.

10
00:00:55,820 --> 00:01:00,290
Our values for y will fall under the range zero one.

11
00:01:00,500 --> 00:01:08,210
And so with that we'll consider that we have a Y because here if we we could consider that our y.

12
00:01:08,390 --> 00:01:10,820
Um chapel is what the model predicts.

13
00:01:10,820 --> 00:01:14,360
So y chapel is or the model's prediction.

14
00:01:14,360 --> 00:01:20,180
And then y itself is the actual um or expected prediction.

15
00:01:20,180 --> 00:01:23,450
So here we have model's prediction.

16
00:01:24,110 --> 00:01:24,980
Take that off.

17
00:01:24,980 --> 00:01:29,780
We have model's model's prediction.

18
00:01:30,230 --> 00:01:31,580
There we go.

19
00:01:31,580 --> 00:01:37,310
And then right below or just here we have the actual prediction.

20
00:01:37,310 --> 00:01:41,660
So now here we have um actual prediction.

21
00:01:41,660 --> 00:01:44,810
So this is what we expect or expected prediction.

22
00:01:44,810 --> 00:01:45,710
So that's it.

23
00:01:45,710 --> 00:01:47,900
We have Y chapel and we have y.

24
00:01:47,930 --> 00:01:54,710
Now when you or let's suppose that we have um or the model predicts let's say this is what the model

25
00:01:54,710 --> 00:01:55,310
predicts.

26
00:01:55,310 --> 00:01:56,810
And then this is the actual.

27
00:01:56,810 --> 00:02:00,050
So here we have model and then we have the actual prediction.

28
00:02:00,050 --> 00:02:03,920
So let's suppose that the model predicts zero an output of zero.

29
00:02:03,920 --> 00:02:07,370
And ah the actual prediction is the label zero.

30
00:02:07,370 --> 00:02:13,100
In that case you would have um y I let's let me take this off.

31
00:02:13,100 --> 00:02:19,100
Let's, um, take this off here and have that to be Y and y chapel.

32
00:02:19,100 --> 00:02:24,020
So here we have y chapel, which is what the model predicts.

33
00:02:24,020 --> 00:02:27,770
And then this is what we expected to predict it.

34
00:02:27,770 --> 00:02:30,350
So here we have zero and zero.

35
00:02:30,350 --> 00:02:32,090
So we have zero.

36
00:02:32,090 --> 00:02:38,990
But because we're multiplying this zero by uh very large negative number, all here multiplied by a

37
00:02:38,990 --> 00:02:43,160
very large negative number, we're going to have output of zero.

38
00:02:43,160 --> 00:02:54,440
And then on this other side we have one -0 which is one times the log of one -0 which is a log of one.

39
00:02:54,440 --> 00:03:00,050
But when you look at this log like you get back to this, um, plot, right here, you have the log

40
00:03:00,050 --> 00:03:01,730
of one which is zero.

41
00:03:01,880 --> 00:03:05,690
And so because log of one is zero we have one times zero.

42
00:03:05,690 --> 00:03:07,880
And so overall we have zero.

43
00:03:07,880 --> 00:03:13,460
So we find that the error, the error at the end or the loss is zero.

44
00:03:13,460 --> 00:03:21,350
And this makes sense because we want to have a loss function which is such that when our model predicts

45
00:03:21,350 --> 00:03:29,510
exactly as uh, or the same output as the actual prediction, then we should have uh, an error of zero

46
00:03:29,510 --> 00:03:30,620
or loss of zero.

47
00:03:30,620 --> 00:03:34,820
Now let's suppose that we have um, zero and then one.

48
00:03:34,820 --> 00:03:38,960
In that case you would, you would have 0 or 1 right here.

49
00:03:38,960 --> 00:03:40,310
So this is going to be one.

50
00:03:40,340 --> 00:03:49,610
This is one times a large negative number is um here we have log of um y or chapel which is log of zero.

51
00:03:49,610 --> 00:03:51,950
It's a large negative number as we have seen already.

52
00:03:52,370 --> 00:03:54,170
So one times a large negative number.

53
00:03:54,320 --> 00:03:56,720
So um, a large negative number.

54
00:03:56,720 --> 00:04:04,250
So we'll have negative um, let's call it l let's call this L uh, here we have negative l and then

55
00:04:04,250 --> 00:04:11,750
here we have one -0 which is one times the log of one minus one which is the log of zero.

56
00:04:11,750 --> 00:04:20,060
So we have uh one times a large negative number again which leads to, um, us getting um, l so here

57
00:04:20,060 --> 00:04:28,040
we have l so overall we're going to have negative l minus L, so we will have uh, negative two L.

58
00:04:28,040 --> 00:04:30,380
So we have a very large negative number.

59
00:04:30,380 --> 00:04:31,220
So.

60
00:04:32,060 --> 00:04:37,070
We are penalizing the model for predicting the wrong outputs.

61
00:04:37,100 --> 00:04:41,120
Actually, when you have this negative, this negative times this negative will give a plus here.

62
00:04:41,120 --> 00:04:43,250
So this negative times this gives plus.

63
00:04:43,250 --> 00:04:44,540
And here we have minus.

64
00:04:44,540 --> 00:04:46,310
So we have minus here.

65
00:04:46,310 --> 00:04:51,140
And then we here we have one times log of zero which is negative l.

66
00:04:51,140 --> 00:04:56,600
So here we have negative l and l minus minus l is two l.

67
00:04:56,600 --> 00:04:58,730
So we have a large positive number.

68
00:04:58,730 --> 00:05:05,000
But what's important to note is the fact that we are penalizing the model for the wrong prediction.

69
00:05:05,000 --> 00:05:08,630
Now if we have one and we have zero we would have something similar.

70
00:05:08,630 --> 00:05:15,080
Then if we have one and we have one, then we should have a zero, because we can now try out this binary

71
00:05:15,080 --> 00:05:19,850
cross entropy loss method on or in TensorFlow with a simple example.

72
00:05:19,850 --> 00:05:23,870
So let's say we have y true and there we go.

73
00:05:23,870 --> 00:05:24,920
We have y true.

74
00:05:24,920 --> 00:05:26,090
Let's give it some values.

75
00:05:26,090 --> 00:05:27,470
Let's say we have zero.

76
00:05:27,470 --> 00:05:30,950
We have one we have zero and then we have zero.

77
00:05:30,950 --> 00:05:34,730
And then we have y pred widespread.

78
00:05:34,730 --> 00:05:37,010
Let's say we have zero.

79
00:05:37,010 --> 00:05:40,430
We have one, we have zero and then we have zero.

80
00:05:40,430 --> 00:05:41,270
There we go.

81
00:05:41,270 --> 00:05:44,630
So we have our binary cross entropy loss which is equal.

82
00:05:44,630 --> 00:05:52,550
This um TensorFlow Keras, um binary cross entropy, binary cross entropy.

83
00:05:52,550 --> 00:05:53,540
There we go.

84
00:05:53,540 --> 00:05:55,430
And then now we could simply call that.

85
00:05:55,430 --> 00:06:00,260
So let's print out BCE which takes in the y true.

86
00:06:00,260 --> 00:06:02,180
And then the y pred.

87
00:06:02,390 --> 00:06:05,450
We run that and let's see what we obtain.

88
00:06:05,870 --> 00:06:11,150
Getting an error cannot convert 0.0 to integer tensor of type int 32.

89
00:06:11,540 --> 00:06:14,750
Well, our y pred should be a float.

90
00:06:14,750 --> 00:06:17,840
So let's have this point and then run that again.

91
00:06:17,840 --> 00:06:18,740
And there we go.

92
00:06:18,740 --> 00:06:21,080
You see we have an error of zero.

93
00:06:21,080 --> 00:06:24,050
Which makes sense because the y is the same as the y.

94
00:06:24,080 --> 00:06:24,620
True.

95
00:06:24,620 --> 00:06:28,070
Now let's suppose that we have 0.8.

96
00:06:28,070 --> 00:06:33,920
We have um instead of zero we have 0.8 instead of one we have 0.2 instead of zero.

97
00:06:33,920 --> 00:06:37,370
We have say zero point um nine.

98
00:06:37,370 --> 00:06:43,820
And then instead of zero we have let's say one, we run that again and then we'll find that the loss

99
00:06:43,820 --> 00:06:49,670
now must have increased as compared to when we really predicted all the values.

100
00:06:49,670 --> 00:06:52,610
Now let's go closer to the actual values.

101
00:06:52,610 --> 00:06:57,020
Let's reduce this, let's increase this and then let's reduce this.

102
00:06:57,020 --> 00:06:59,270
And then let's say this is zero.

103
00:06:59,300 --> 00:07:00,920
We run that again.

104
00:07:01,040 --> 00:07:07,040
And you see now that compared to the five we just had, now we'll have a smaller value though not zero

105
00:07:07,040 --> 00:07:10,070
but um a value approaching zero.

106
00:07:10,070 --> 00:07:17,360
So you see we have 0.1, which makes sense because our predictions are now much closer to a y true values.

107
00:07:18,110 --> 00:07:19,100
And so that's it.

108
00:07:19,130 --> 00:07:22,550
We understand the binary cross entropy loss.

109
00:07:22,580 --> 00:07:25,040
Now we will dive into compiling our model.

110
00:07:25,040 --> 00:07:27,350
So we just need to have model compile.

111
00:07:27,350 --> 00:07:31,460
And this compilation will consist of specifying the optimizer.

112
00:07:31,760 --> 00:07:34,820
The optimizer which is Adam optimizer.

113
00:07:35,360 --> 00:07:42,470
Uh we have learning rate learning rate equal say 0.01.

114
00:07:42,710 --> 00:07:46,640
Then we have um, the loss which we just defined.

115
00:07:46,910 --> 00:07:50,900
Um, this loss right here, which we've just defined is the binary cross entropy loss.

116
00:07:50,900 --> 00:07:54,830
So we have binary, um, cross entropy.

117
00:07:55,280 --> 00:08:01,970
Getting back to the documentation, you would find that we have this from logits, um, parameter right

118
00:08:01,970 --> 00:08:02,390
here.

119
00:08:02,390 --> 00:08:08,990
If the output value or values range between negative infinity and positive infinity, then we want to

120
00:08:08,990 --> 00:08:09,860
use the from.

121
00:08:09,860 --> 00:08:12,080
Or we want to set the from logits to be true.

122
00:08:12,080 --> 00:08:18,200
But if it's a probability that if it lies between 0 and 1, then um, we will have from largest to be

123
00:08:18,200 --> 00:08:18,950
false.

124
00:08:19,100 --> 00:08:25,070
Now if you get back to our model, you would find that we have the sigmoid activation.

125
00:08:25,070 --> 00:08:32,450
And the sigmoid activation, um, makes or ensures that we have values ranging between 0 and 1.

126
00:08:32,450 --> 00:08:38,600
And so because of that we have our from logit set to false and which is the default value for the from

127
00:08:38,600 --> 00:08:39,860
logits argument.

128
00:08:39,860 --> 00:08:41,600
We also have the labels model.

129
00:08:41,600 --> 00:08:46,880
You could check out documentation float in the range zero and one where when when we have zero there's

130
00:08:46,880 --> 00:08:47,840
no smoothing.

131
00:08:48,260 --> 00:08:50,750
Um that's it by default is zero.

132
00:08:50,750 --> 00:08:52,070
So there's no smoothing.

133
00:08:52,070 --> 00:08:57,230
And when it's greater than zero, we compute the loss between the predicted labels and a smaller version

134
00:08:57,230 --> 00:08:58,220
of the true labels.

135
00:08:58,220 --> 00:09:02,330
So where the smoothing squeezes the label towards 0.5.

136
00:09:02,330 --> 00:09:05,540
So larger value of label smoothing corresponds to heavier smoothing.

137
00:09:05,540 --> 00:09:11,600
Now if we get back to the code, what will actually or what they're trying to say here is if you have

138
00:09:11,600 --> 00:09:23,780
the label smoothing set to say 0.1, then by just having 0.1 year and year, 0.9 and 0.1, and then

139
00:09:23,780 --> 00:09:29,870
let's say zero and running that we see we have a loss of 0.35.

140
00:09:29,870 --> 00:09:31,310
Well this is zero point.

141
00:09:31,510 --> 00:09:31,720
One.

142
00:09:31,720 --> 00:09:32,590
This is 0.9.

143
00:09:32,590 --> 00:09:34,000
This is 0.1.

144
00:09:34,030 --> 00:09:36,070
Let's say we have now 0.1.

145
00:09:37,050 --> 00:09:38,910
We should have a loss.

146
00:09:38,910 --> 00:09:41,160
A small, an even smaller loss.

147
00:09:42,020 --> 00:09:48,680
And this is because instead of comparing zero or instead of comparing zero, we have here with 0.1,

148
00:09:48,680 --> 00:09:52,430
we're now comparing 0.1 with this 0.1.

149
00:09:52,430 --> 00:09:56,780
And that's because we specify the label smoothing value to be 0.1.

150
00:09:57,470 --> 00:10:04,970
So you could see again that having all this the exact same for y pred and y true doesn't yield 0 or

151
00:10:04,970 --> 00:10:06,380
0 anymore.

152
00:10:06,380 --> 00:10:10,400
If you go back to zero and you run this, you should have zero.

153
00:10:10,430 --> 00:10:14,330
Let's get back to 0.1 and then modify this y pred.

154
00:10:14,330 --> 00:10:19,640
So you see when you have um zero we should have um an output of zero.

155
00:10:19,730 --> 00:10:25,640
So what we're getting here is for this output zero going back to 0.1, you would find that when is when

156
00:10:25,640 --> 00:10:31,280
you increase zero or when you move 0 to 0.1 and then 1 to 0.9.

157
00:10:31,280 --> 00:10:36,890
And here 0 to 0.1 that you have a smaller output for the loss.

158
00:10:37,760 --> 00:10:39,470
And so that's it for this section.

159
00:10:39,470 --> 00:10:41,960
In the next section we'll dive into training our model.