﻿1
00:00:01,630 --> 00:00:05,250
‫In the last lecture, we discussed a single cell called Perceptron.

2
00:00:06,280 --> 00:00:10,470
‫Now, in this lecture, we are going to extend the concepts that we learned in the last one.

3
00:00:12,360 --> 00:00:19,950
‫I told you that a Perceptron takes in binary input that is one and zero and gives out a single binary

4
00:00:19,950 --> 00:00:20,410
‫output.

5
00:00:21,900 --> 00:00:24,840
‫But there is no logical reason to put this limitation.

6
00:00:25,980 --> 00:00:29,670
‫We can easily extend this to any real input values.

7
00:00:31,860 --> 00:00:39,960
‫So instead of having black and white only or zero and one only, we can have different shades of grey

8
00:00:39,960 --> 00:00:40,410
‫as well.

9
00:00:40,890 --> 00:00:48,330
‫That is, we accept any real value as input. The weights and threshold function in the same way.

10
00:00:52,420 --> 00:00:59,320
‫Next, we will take a look at this equation of Perceptron will slightly modify it to reach at generally

11
00:00:59,320 --> 00:01:02,530
‫used equation. In this equation

12
00:01:02,980 --> 00:01:07,930
‫We are multiplying weights, adding these terms and comparing them with the threshold.

13
00:01:10,510 --> 00:01:16,060
‫We will make a small change here, bring this threshold to the left and write

14
00:01:16,180 --> 00:01:23,950
‫this new term as B basically it means that we have B is equal to minus threshold

15
00:01:25,510 --> 00:01:31,270
‫People usually call this constant as the bias doesn't really make any difference.

16
00:01:31,420 --> 00:01:36,970
‫But this is the mathematical representation of Perceptron, as you would find in most of the books.

17
00:01:38,920 --> 00:01:39,790
‫Now, let's move on.

18
00:01:39,880 --> 00:01:42,430
‫and look at the graphical representation of this function.

19
00:01:45,920 --> 00:01:54,410
‫If you look at this graph, if the calculated value of this left part, that is summation of weight

20
00:01:54,440 --> 00:02:02,870
‫multiplied by features, plus the bias, if the summation if this left part is less than zero, the

21
00:02:02,870 --> 00:02:04,370
‫output comes out to be zero.

22
00:02:05,870 --> 00:02:11,270
‫So you can see in the graph till zero, the output of the function is also zero.

23
00:02:14,210 --> 00:02:17,390
‫When this left part is greater than zero.

24
00:02:17,990 --> 00:02:22,010
‫This function suddenly activates and gives an output of one.

25
00:02:25,030 --> 00:02:28,930
‫This type of function is called a simple step function.

26
00:02:30,640 --> 00:02:36,910
‫This is one type of activation function activation functions are basically those functions which take

27
00:02:36,910 --> 00:02:41,610
‫into account some type of threshold value. Here

28
00:02:42,520 --> 00:02:43,930
‫the threshold value is zero.

29
00:02:44,680 --> 00:02:52,450
‫And this function takes a sudden step at this threshold value, which is why it is called a step activation

30
00:02:52,450 --> 00:02:52,870
‫function.

31
00:02:57,180 --> 00:02:59,820
‫There are many other types of activation functions.

32
00:03:01,200 --> 00:03:03,570
‫Most popular one is the sigmoid function.

33
00:03:06,120 --> 00:03:09,630
‫It is a pictorial representation of how sigmoid function looks.

34
00:03:11,070 --> 00:03:13,550
‫It is a smooth s shaped curve.

35
00:03:14,430 --> 00:03:21,780
‫It also has a minimum of zero at minus infinity and maximum of one at plus infinity.

36
00:03:22,950 --> 00:03:31,110
‫But instead of having a step and rising suddenly, this function rises gradually and continuously.

37
00:03:32,490 --> 00:03:38,100
‫This function is also called logistic function and is also used in logistic regression, which is a

38
00:03:38,100 --> 00:03:39,990
‫very basic classification algorithm.

39
00:03:43,420 --> 00:03:50,700
‫Now, this sigmoid function solves a major problem that we have with this step function when we are training

40
00:03:50,730 --> 00:03:55,490
‫our Perceptron using historical data to find the value of weights and threshold.

41
00:03:56,600 --> 00:04:00,180
‫This step function is very sensitive to individual observations.

42
00:04:01,230 --> 00:04:09,480
‫For example, when we are classifying fashion objects in our fashion MNISD dataset and our algorithm

43
00:04:09,510 --> 00:04:18,000
‫is misclassifying a particular image of boots as trousers to rectify this, our model will need to find

44
00:04:18,000 --> 00:04:19,800
‫new weights and bias values.

45
00:04:21,450 --> 00:04:22,770
‫This is where the problem comes.

46
00:04:23,430 --> 00:04:30,720
‫Small change in the weight and the bias values will completely flip the output for a lot of the other observations.

47
00:04:31,620 --> 00:04:37,530
‫This makes the step function very hard to control with sigmoid function.

48
00:04:37,710 --> 00:04:41,110
‫The change is gradual, so it is easier to control the behavior.

49
00:04:43,350 --> 00:04:50,340
‫Now, when we replace this step function with a sigmoid activation function, we call this new cell

50
00:04:50,460 --> 00:04:55,390
‫as a sigmoid neuron or a logistic neuron instead of Perceptron.

51
00:04:57,090 --> 00:05:00,840
‫Mathematically, a sigmoid function formula looks like this.

52
00:05:01,650 --> 00:05:03,780
‫It is sigmoid

53
00:05:03,800 --> 00:05:06,870
‫of z is equal to one upon one.

54
00:05:06,870 --> 00:05:09,840
‫1+ e raise to the power of minus z

55
00:05:10,760 --> 00:05:17,340
‫And if you plot this function on the graph, that is, if you have the Z on X axis and you calculate

56
00:05:17,340 --> 00:05:21,420
‫the value of this function using this formula and plot it on the Y axis.

57
00:05:21,930 --> 00:05:23,880
‫This is how this formula looks like.

58
00:05:25,620 --> 00:05:27,430
‫Now we will replace the value of Z.

59
00:05:27,780 --> 00:05:30,090
‫With the summation plus bias value.

60
00:05:30,870 --> 00:05:37,050
‫So WjXj plus B was the input to our activation function.

61
00:05:37,890 --> 00:05:40,630
‫So we input this in place of z.

62
00:05:41,220 --> 00:05:44,700
‫So this is what the output of our neuron looks like.

63
00:05:45,060 --> 00:05:51,570
‫It is one upon one plus exponential minus summation of words with features.

64
00:05:51,780 --> 00:06:01,530
‫Minus B, if you calculate this value, it will always lie between zero to one and it will have a shape

65
00:06:01,530 --> 00:06:02,160
‫like this.

66
00:06:03,060 --> 00:06:06,930
‫So you can compare it with this step function also in step function.

67
00:06:07,050 --> 00:06:14,280
‫We calculated output using this formula where we got zero.

68
00:06:14,400 --> 00:06:19,320
‫If this summation was less than zero and we got one, if the summation was greater than equal to zero.

69
00:06:20,280 --> 00:06:23,640
‫We have replaced this step with a sigmoid function.

70
00:06:23,850 --> 00:06:25,200
‫This is a continuous function.

71
00:06:25,260 --> 00:06:27,030
‫We do not need two parts to it.

72
00:06:27,750 --> 00:06:35,730
‫So we just input the value of Wj Xj  and the bias to calculate the output, which is a continuous

73
00:06:35,730 --> 00:06:36,090
‫function.

74
00:06:37,270 --> 00:06:45,270
‫Now with this our artificial neural cell is ready, which takes in any number of real value inputs and

75
00:06:45,270 --> 00:06:47,760
‫gives an output between zero and one.

76
00:06:49,610 --> 00:06:56,240
‫It is time to create an artificial neural network, which is basically a network of these individual

77
00:06:56,240 --> 00:06:56,660
‫cells.

78
00:06:58,520 --> 00:07:00,890
‫So just a brief recap of this class.

79
00:07:01,910 --> 00:07:07,760
‫Initially, I said that we taken binary input and gave out one single binary output.

80
00:07:08,570 --> 00:07:18,860
‫We replaced the input from binary to any real value, and we have replaced the binary output to a value

81
00:07:18,860 --> 00:07:20,240
‫between zero and one.

82
00:07:21,680 --> 00:07:28,460
‫So in this generalized form, we taken any input which have any real value and we get one output with

83
00:07:28,460 --> 00:07:30,020
‫lies between zero and one.

