1
00:00:00,060 --> 00:00:06,190
We said before that a binomial random variable was a special type of discrete random variable.

2
00:00:06,210 --> 00:00:12,660
And what we want to say now is that this Bernoulli random variable is a special kind of binomial random

3
00:00:12,660 --> 00:00:13,290
variable.

4
00:00:13,290 --> 00:00:19,590
And the difference is just that with the Bernoulli random variable, we're modeling exactly one trial,

5
00:00:19,590 --> 00:00:23,400
whereas with the binomial random variable, we were modeling many trials.

6
00:00:23,400 --> 00:00:29,610
So before if we used a binomial random variable to model the probability of flipping a certain number

7
00:00:29,610 --> 00:00:35,520
of heads on a larger number of coin flips, we might have been flipping the coin five times or ten times

8
00:00:35,520 --> 00:00:36,420
or 1000 times.

9
00:00:36,420 --> 00:00:38,250
We had many trials.

10
00:00:38,250 --> 00:00:43,170
When we use a Bernoulli random variable, we drill that down to just one trial.

11
00:00:43,170 --> 00:00:49,890
So all we're doing here is flipping a coin one time, and in that scenario we can use a Bernoulli random

12
00:00:49,890 --> 00:00:52,050
variable to model probability.

13
00:00:52,050 --> 00:00:57,780
We can also, of course build our Bernoulli distribution, which will be a visual representation of

14
00:00:57,780 --> 00:01:01,020
the probability associated with that Bernoulli random variable.

15
00:01:01,050 --> 00:01:09,510
The way that we accomplish using this Bernoulli random variable is by defining success as a one and

16
00:01:09,510 --> 00:01:11,100
failure as a zero.

17
00:01:11,100 --> 00:01:18,360
In other words, we assign numeric mathematical values to these concepts of success and failure.

18
00:01:18,360 --> 00:01:23,760
So if we're flipping a coin one time and we're looking at the probability that we flip heads, we would

19
00:01:23,760 --> 00:01:29,490
define flipping heads as a success and therefore as a one and everything else, in this case, flipping

20
00:01:29,520 --> 00:01:31,890
tails as a failure or zero.

21
00:01:31,890 --> 00:01:39,180
If we're rolling a six sided die one time we have one trial, we could define success as rolling a two.

22
00:01:39,180 --> 00:01:44,340
So rolling a two, we would sort of assign this value of one rolling anything else.

23
00:01:44,340 --> 00:01:49,720
So a one, three, four, five or six would be assigned a failure or a zero value.

24
00:01:49,740 --> 00:01:54,720
It's really this idea of distilling down success and failure to these values.

25
00:01:54,720 --> 00:01:59,490
One and zero that makes this Bernoulli random variable unique and useful.

26
00:01:59,490 --> 00:02:06,570
So if we stick with our example from earlier about the rideshare company where we defined success as

27
00:02:06,570 --> 00:02:14,250
a driver arriving within 10 minutes of the moment when the passenger requests the ride and failure as

28
00:02:14,250 --> 00:02:20,340
a driver taking longer than 10 minutes to arrive After the ride request, we said that the probability

29
00:02:20,340 --> 00:02:28,590
of success was 75%, which means then we know that the Bernoulli distribution looks like this.

30
00:02:28,590 --> 00:02:30,030
We know this right away.

31
00:02:30,030 --> 00:02:35,910
This is just like the binomial distribution in the sense that we have the set of discrete countable

32
00:02:35,910 --> 00:02:38,430
outcomes along the horizontal axis.

33
00:02:38,430 --> 00:02:40,980
In this case, we're using this Bernoulli random variable.

34
00:02:40,980 --> 00:02:43,440
So we assign those values zero and one.

35
00:02:43,440 --> 00:02:49,320
So with the Bernoulli distribution, we will only ever see zero and one along the horizontal axis.

36
00:02:49,320 --> 00:02:54,000
We will always see both zero and one and never any other values.

37
00:02:54,000 --> 00:02:57,150
And then along the vertical axis, we model probability.

38
00:02:57,150 --> 00:03:03,390
So using that rideshare example where we said that the probability of success was 75%, then here at

39
00:03:03,390 --> 00:03:11,040
one we have a 75% probability, and since zero is the probability of failure, just like with the binomial

40
00:03:11,040 --> 00:03:17,730
random variable, if we define the probability of success as P, then the probability of failure has

41
00:03:17,730 --> 00:03:19,140
to be one minus P.

42
00:03:19,170 --> 00:03:24,510
Sometimes we write that as Q, but in this case success is 75%.

43
00:03:24,510 --> 00:03:28,350
So one -75% is 25% or 0.25.

44
00:03:28,350 --> 00:03:34,920
And so we can go ahead and graph in here 0.25 for this probability of failure or the probability of

45
00:03:34,920 --> 00:03:36,330
this idea of zero.

46
00:03:36,330 --> 00:03:42,390
So in this Bernoulli distribution, of course, just like other distributions, these two values here,

47
00:03:42,390 --> 00:03:51,420
the 0.25 value that we see here and the 0.75 value that we see here, these values are always going

48
00:03:51,420 --> 00:03:54,600
to sum to one just like our other distribution.

49
00:03:54,600 --> 00:04:01,020
We're just always going to have only these two values represented zero and one as failure and success

50
00:04:01,020 --> 00:04:01,860
respectively.

51
00:04:01,860 --> 00:04:08,760
Now, once we know that we're dealing with a Bernoulli distribution for a Bernoulli random variable,

52
00:04:08,760 --> 00:04:14,490
just like all of our other distributions that unlocks for us formulas that we already have for mean

53
00:04:14,490 --> 00:04:17,790
variance and standard deviation for Bernoulli distribution.

54
00:04:17,790 --> 00:04:19,380
Here's what those look like.

55
00:04:19,410 --> 00:04:25,890
These are actually adapted directly from the corresponding formulas for a binomial random variable.

56
00:04:25,920 --> 00:04:32,370
The only difference is that they're incorporating these values of one and zero for success and failure.

57
00:04:32,370 --> 00:04:39,840
You can imagine here with this rideshare example, if we wanted to find the mean for just this one particular

58
00:04:39,840 --> 00:04:47,100
trial where our chance of success is 75% and we have these values zero and one, we would find the mean

59
00:04:47,100 --> 00:04:53,760
or the expected value by taking zero, multiplying it by the 25% chance of getting zero.

60
00:04:53,760 --> 00:04:57,720
And then we would add to that the product of one and 0.75.

61
00:04:57,720 --> 00:04:59,850
So our mean would.

62
00:04:59,980 --> 00:05:02,440
Actually look like 0.25.

63
00:05:02,440 --> 00:05:05,800
So 0.25 times zero.

64
00:05:06,660 --> 00:05:09,060
Plus 0.75.

65
00:05:10,140 --> 00:05:11,360
Times one.

66
00:05:11,370 --> 00:05:13,410
This would be our expected value.

67
00:05:13,410 --> 00:05:20,190
If we have these discrete values zero and one, we have a 25% chance of getting quote unquote zero and

68
00:05:20,190 --> 00:05:22,860
a 75% chance of getting quote unquote, one.

69
00:05:22,860 --> 00:05:27,150
And so this is our weighted mean, our expected value.

70
00:05:27,180 --> 00:05:31,140
Of course, we get zero from this first term and we get 0.75 from the second term.

71
00:05:31,140 --> 00:05:33,090
So we get 0.75.

72
00:05:33,090 --> 00:05:38,490
And what we see is that this value here is equivalent to the chance of success.

73
00:05:38,490 --> 00:05:42,990
We defined earlier 0.75 or the chance of getting a one here, 0.75.

74
00:05:42,990 --> 00:05:49,650
And that's always going to be the case because this zero value here is going to zero out this failure

75
00:05:49,650 --> 00:05:56,910
term, which is why we're always going to be left with a mean equal to p the probability of success.

76
00:05:56,910 --> 00:06:03,150
So for a Bernoulli random variable, the mean will always be equal to the chance of success.

77
00:06:03,150 --> 00:06:11,280
P Compare that to the binomial random variable where the mean was equal to n times P where n was the

78
00:06:11,280 --> 00:06:14,390
number of trials and P was the chance of success.

79
00:06:14,400 --> 00:06:19,170
This makes sense also when we think about adapting it this way, because now with the Bernoulli random

80
00:06:19,170 --> 00:06:25,170
variable, which we're saying is a specific kind of binomial random variable, we only have one trial

81
00:06:25,170 --> 00:06:32,010
and so n is equal to one, which means we end up with a mean of one times P, which of course is just

82
00:06:32,010 --> 00:06:32,310
P.

83
00:06:32,310 --> 00:06:37,620
And so of course the mean for the Bernoulli random variable is going to be equal to just P, and we

84
00:06:37,620 --> 00:06:42,750
can use similar logic to get the formulas for variance in standard deviation for the Bernoulli random

85
00:06:42,750 --> 00:06:49,440
variable as adaptations of the formulas for variance in standard deviation of the binomial random variable.

86
00:06:49,440 --> 00:06:53,760
So for example, remember to find variance.

87
00:06:53,760 --> 00:06:59,550
When we have a discrete random variable, we take all of the values that our variable can take on.

88
00:06:59,550 --> 00:07:04,050
So in this case, zero and one, and for each one we subtract the mean.

89
00:07:04,050 --> 00:07:08,010
Well, our mean is P, So we would take here our first value of zero.

90
00:07:08,010 --> 00:07:15,360
So we would say zero, we would subtract the mean, which is p, and then we would square that value.

91
00:07:15,360 --> 00:07:22,560
So this is the square difference between this particular x sub I and the mean, and then we multiply

92
00:07:22,560 --> 00:07:26,310
this by the probability of getting a zero.

93
00:07:26,310 --> 00:07:30,330
Well, in this case that's the probability of failure one minus P.

94
00:07:30,330 --> 00:07:37,050
So we multiply that by one minus P, and we do this for every value that X can possibly take on.

95
00:07:37,050 --> 00:07:39,390
So the only other value here is one.

96
00:07:39,390 --> 00:07:48,360
So we add to this here we have one minus the mean, we square that difference and then we multiply this

97
00:07:48,360 --> 00:07:54,600
by the probability that this value occurs, which in this case is 0.75, it's the probability of success,

98
00:07:54,600 --> 00:07:57,540
it's P, so we multiply by P.

99
00:07:57,720 --> 00:08:00,240
So this is the sum that we find for the variance.

100
00:08:00,240 --> 00:08:07,890
And when we simplify this equation here, we get zero minus P is a negative P, negative P quantity

101
00:08:07,890 --> 00:08:10,170
squared is positive P squared.

102
00:08:10,170 --> 00:08:19,080
So we could say P squared times one minus P, and then here we have p times one minus P quantity squared.

103
00:08:19,080 --> 00:08:26,010
Well, when we expand one minus P quantity squared, that's like saying one minus P times one minus

104
00:08:26,040 --> 00:08:28,110
P, This goes back to algebra.

105
00:08:28,110 --> 00:08:31,590
We have to multiply our first terms one and one.

106
00:08:32,010 --> 00:08:33,990
So one times one is one.

107
00:08:34,169 --> 00:08:42,750
Then we multiply our outer terms one and negative P one times negative P is negative P, Then we multiply

108
00:08:42,750 --> 00:08:50,280
our inner terms negative P times one is a negative P, and then our last terms negative P times negative

109
00:08:50,280 --> 00:08:52,050
P is a positive.

110
00:08:52,820 --> 00:08:53,910
P squared.

111
00:08:53,930 --> 00:08:57,440
And when we simplify again, this is all just algebra.

112
00:08:57,440 --> 00:08:58,640
We get P squared times.

113
00:08:58,640 --> 00:09:08,330
One is P squared, P squared times a negative P is minus P cubed, we'll say plus p times one is P.

114
00:09:08,720 --> 00:09:14,960
Here we have minus P and minus P, that's a minus to P.

115
00:09:15,170 --> 00:09:24,980
So when we multiply P by -$0.02, we get -$0.02 squared and then p times P squared is P cubed.

116
00:09:24,980 --> 00:09:29,660
We get our negative P cubed and our positive P cubed to cancel with each other.

117
00:09:29,660 --> 00:09:33,230
And then we're just left with we'll start with this P here.

118
00:09:33,230 --> 00:09:42,260
So P and then we have P squared -$0.02 squared is a minus P squared, or we can write that as the variance

119
00:09:42,290 --> 00:09:47,930
is equal to when we factor out a P here, we get P times one minus P.

120
00:09:47,930 --> 00:09:52,100
In other words, we rewrote P minus P squared as p times one minus P.

121
00:09:52,100 --> 00:09:57,980
And you see here how we build this formula for variance that we already established over here as the

122
00:09:57,980 --> 00:10:00,200
variance of a newly random variable.

123
00:10:00,200 --> 00:10:04,310
Now if you're not super comfortable with algebra, that's okay.

124
00:10:04,340 --> 00:10:11,270
All of these steps in the middle here is just a bunch of algebra to prove to you that we can get from

125
00:10:11,270 --> 00:10:15,440
this first equation here that we built.

126
00:10:15,440 --> 00:10:22,040
From what we already know about variance down to this equation here to prove to you that this variance

127
00:10:22,040 --> 00:10:26,300
formula works, that it's true that it makes sense for a Bernoulli random variable.

128
00:10:26,300 --> 00:10:31,790
But if all we do is walk away from this lesson, understanding that the variance of a Bernoulli random

129
00:10:31,790 --> 00:10:36,680
variable is given by this formula here, then that's totally fine.

130
00:10:36,680 --> 00:10:38,090
This is really the main takeaway.

131
00:10:38,090 --> 00:10:43,670
We want to understand that in order to find variance for a Bernoulli random variable, we use this formula

132
00:10:43,670 --> 00:10:44,210
here.

133
00:10:44,210 --> 00:10:55,100
In our case, that means that variance is equal to P, which is 0.75 times one minus P or Q.

134
00:10:55,130 --> 00:11:05,840
The probability of failure, which is 0.25, that gives us a value for variance of 0.1875.

135
00:11:05,840 --> 00:11:10,130
And then standard deviation is always just the square root of variance.

136
00:11:10,130 --> 00:11:21,770
So the square root of 0.1875, which is approximately equal to 0.4330.

137
00:11:21,770 --> 00:11:27,110
And so we're able to calculate the mean variance in standard deviation of a Bernoulli random variable.

138
00:11:27,110 --> 00:11:33,710
So the takeaway here is that we're interested in drilling down on this idea of a Bernoulli random variable,

139
00:11:33,740 --> 00:11:37,880
because the thinking behind it is interesting what we're really saying here.

140
00:11:37,880 --> 00:11:45,290
Let's take our rideshare example where we're saying that there's always a 75% chance that a driver arrives

141
00:11:45,290 --> 00:11:48,500
within 10 minutes of the ride being requested.

142
00:11:48,500 --> 00:11:55,160
We're saying that if we distill that down to just one trial, that if somebody requests a ride within

143
00:11:55,160 --> 00:12:02,030
our ridesharing app, that there's a 75% chance that a driver will arrive within 10 minutes for that

144
00:12:02,030 --> 00:12:03,500
particular ride.

145
00:12:03,500 --> 00:12:09,470
But of course, similar to some of the other distributions we've looked at, it's really nonsensical

146
00:12:09,470 --> 00:12:14,570
to think about one particular driver arriving 75% of the time.

147
00:12:14,570 --> 00:12:21,980
One particular driver is either going to arrive within 10 minutes, success or not failure.

148
00:12:21,980 --> 00:12:27,800
One particular driver will always take on one of those two values only and nothing in between.

149
00:12:27,800 --> 00:12:30,260
They will either succeed or fail.

150
00:12:30,260 --> 00:12:37,280
We can always code them as specifically a one or a zero, and every single individual driver we pick

151
00:12:37,280 --> 00:12:40,040
will always be a one or a zero.

152
00:12:40,190 --> 00:12:49,280
No one driver will ever fall in between zero or one as a 0.4 or a 0.637 or a 0.75.

153
00:12:49,280 --> 00:12:52,310
They will always be a zero or a one individually.

154
00:12:52,310 --> 00:12:58,520
And yet we're saying that the probability of success for every single driver is some value between zero

155
00:12:58,520 --> 00:12:59,090
and one.

156
00:12:59,090 --> 00:13:00,980
It is 0.75.

157
00:13:00,980 --> 00:13:07,760
So the whole goal here is just to understand that while every driver is only going to be success or

158
00:13:07,760 --> 00:13:12,590
failure, they're only going to be one or zero, they're only going to be on or off.

159
00:13:12,590 --> 00:13:19,190
They will only ever take on one of two values at the same time we associate with them this probability

160
00:13:19,190 --> 00:13:20,150
of success.

161
00:13:20,150 --> 00:13:25,700
That is some value in between those two exclusive discrete values.

162
00:13:25,700 --> 00:13:29,720
And every single ride has that same probability of success.

163
00:13:29,720 --> 00:13:33,530
P of 0.75.

164
00:13:33,530 --> 00:13:40,340
Even though each individual ride always gets coded as a failure of zero, the driver doesn't arrive

165
00:13:40,340 --> 00:13:43,640
within 10 minutes or a success of one.

166
00:13:43,640 --> 00:13:46,670
The driver does arrive within 10 minutes.

