WEBVTT

00:00.120 --> 00:05.100
Now that we have gone through descriptive statistics, let's start solving a few questions.

00:13.220 --> 00:18.830
You can see the question on your screen, and you will be able to watch the video.

00:19.100 --> 00:24.740
And after posing that, we do it can solve the questions and we will discuss a solution after that.

00:26.690 --> 00:34.760
You need to find out the meaning of these scores of 620 strings so you can see the scores are not sorted

00:34.760 --> 00:35.120
here.

00:35.150 --> 00:41.930
You can simply find of the men, the max, the range of these scores.

00:42.410 --> 00:50.360
The standard deviation, the variance, the mean median, all these calculations you need to do by hand.

00:51.050 --> 00:55.610
And after that, I will show you how you can solve this using byte.

00:57.170 --> 00:58.160
So let's go ahead.

00:58.580 --> 01:00.830
You can pause the video and find the solution.

01:02.570 --> 01:04.460
Now let's start solving the problem.

01:04.490 --> 01:07.550
I have taken all of these values and sorted.

01:09.140 --> 01:15.560
After sorting the values, the minimum value comes out to be for the maximum value comes out to be sixty

01:15.560 --> 01:20.150
six when we are trying to find out the median of the values.

01:20.360 --> 01:21.890
It will be the middle most value.

01:22.430 --> 01:28.610
So the minimum, most value will be, as these values are, even a number in the most value will be

01:29.330 --> 01:32.030
the average of 10 and 11.

01:32.510 --> 01:33.620
That will be a median.

01:35.540 --> 01:39.860
So the average of 15 and 17 here will be 16.

01:40.130 --> 01:41.690
So that is the median value.

01:42.890 --> 01:44.960
Now, how do we calculate the mean value?

01:44.960 --> 01:45.830
The mean value?

01:45.840 --> 01:53.720
We can be concluded by adding all of these values and dividing by the count of these items, four zero

01:53.720 --> 01:55.550
eight divided by 20.

01:56.750 --> 01:58.370
So let's take a some of these.

02:05.620 --> 02:08.800
And then divide it by the count of these.

02:16.040 --> 02:20.420
In Excel, you also do have divided function to average, so you can use that as well.

02:22.070 --> 02:25.760
Next, we let us find out the standard deviation for these numbers.

02:26.450 --> 02:32.330
So to find out the standard deviation of these numbers, we can simply find out the difference of these

02:32.330 --> 02:35.930
numbers from the mean value.

02:35.940 --> 02:40.940
So you've become a mean value, which will be fixed.

02:43.030 --> 02:51.460
So to fixing the value of what they're doing, the same here, and then we want to subtract each number

02:51.460 --> 02:51.880
from it.

02:53.740 --> 02:55.490
And why we do this.

02:55.510 --> 02:57.670
We also want to square these values.

02:58.090 --> 03:00.040
So let's square them simultaneously.

03:13.270 --> 03:21.320
Let's raise it to the father, too, so that we can be aware of these numbers.

03:34.690 --> 03:40.480
Now, these are the squared deviation of these number from the mean value.

03:41.290 --> 03:47.620
Now we also want to take the average of these what we can simply say save, even take some of these

03:47.620 --> 03:52.750
values and again, divided by the account of these values.

03:53.320 --> 03:56.890
So we can just simply copy the formula as well here.

03:58.270 --> 04:05.020
So you can see this taking the sum of all these values and then dividing by the count of all these values.

04:05.320 --> 04:07.870
So it comes out to be two fifty five point four.

04:08.890 --> 04:13.900
Now, if I simply take a square root of this value now, this is the standard deviation of these numbers

04:14.230 --> 04:16.500
and the variance will be square root of this.

04:16.510 --> 04:22.630
So I can simplify the square root and of this particular number.

04:22.960 --> 04:25.110
So it comes out to be fifteen point nine.

04:25.900 --> 04:31.780
So the mean for this num these numbers will be twenty point for the standard deviation comes out to

04:31.780 --> 04:36.810
be two fifty five point four, and the variance comes out to be fifteen point ninety.

04:39.190 --> 04:41.770
Now, this is how we solve it by hand.

04:44.080 --> 04:53.290
Now, let us try to solve this and let's try to find out how we can find the solution to this using

04:53.290 --> 04:53.620
byte.

04:58.060 --> 05:03.490
So here I have simply created a list of the numbers which we were given in the question, and I have

05:03.490 --> 05:08.620
created a data frame out of it, creating a data frame is simple in nature, and it provides a lot of

05:08.620 --> 05:10.240
functions we can use easily.

05:10.570 --> 05:12.580
So I've created into a data frame.

05:12.970 --> 05:17.700
And now let view the data in a plot.

05:18.400 --> 05:20.820
So for that, you can simply say deer.

05:21.430 --> 05:25.630
You can see Pted dot data frame signifies a creation of a data.

05:26.800 --> 05:32.450
And yet I've given the column name as the F, the name of the data frame which we have created.

05:32.470 --> 05:40.690
So if I want to check the values so I can simplicity of both E and here, I can see all the values which

05:40.690 --> 05:41.290
we have given.

05:42.190 --> 05:52.270
Now, for that, if I want to plot these values, I can simply see the tornado plot and what these values

05:52.270 --> 05:52.690
voters.

05:54.410 --> 05:56.570
So here you can see these are the values which we.

05:59.660 --> 06:05.120
Now, further, if we want to create a histogram, we can simply instead of seeing plot thickens, maybe

06:05.120 --> 06:06.170
a first aid or test.

06:16.310 --> 06:17.480
This is the histogram.

06:17.480 --> 06:26.960
So you can see that the values are usually between zero and 20, and then that is a few values which

06:26.960 --> 06:27.880
are above 60.

06:28.790 --> 06:34.760
Next, we will go ahead and create a density plot for a better visualization so we can do the same thing

06:34.760 --> 06:38.510
and simply say density.

06:40.820 --> 06:42.530
It will give us a density block.

06:44.810 --> 06:45.110
Sorry.

06:50.090 --> 06:51.950
So this will give us identity plot.

06:52.880 --> 06:57.440
So here you can see the maximum values on No.20.

06:59.690 --> 07:03.490
And you can see the data is kind of normal still here.

07:03.500 --> 07:10.010
But after that, we have these values which are extra here, which makes this data a little skewed also.

07:11.360 --> 07:17.360
So let us go further and check the minimum, maximum mean median mode for this so we can simply say

07:17.360 --> 07:21.830
this is not a dog.

07:24.040 --> 07:28.040
Meaning that if we have seen this also,

07:32.180 --> 07:33.800
I copy this entire thing.

07:34.400 --> 07:37.910
What I can show you everything here, Max.

07:39.030 --> 07:39.040
I

07:42.560 --> 07:43.250
mean,

07:46.490 --> 07:48.680
then we'll see the median.

07:53.470 --> 07:58.400
But then let's also see the standard deviation antiquarians.

08:08.240 --> 08:12.290
So you can see the min value is four.

08:12.470 --> 08:14.000
Max is 66.

08:14.660 --> 08:16.610
Min is 20.

08:16.640 --> 08:18.650
Median is 16.

08:19.190 --> 08:22.100
The standard deviation is sixteen point three nine.

08:22.100 --> 08:24.490
And the median is to sixty eight point eight.

08:27.100 --> 08:30.520
If you want to see the more we can view the mood also.

08:37.780 --> 08:42.010
So there are three months, then 12 and 15.

08:44.830 --> 08:50.320
Further, if you want, we can view the skewness and doses of this as well.

08:51.160 --> 09:07.510
So let us find that as want to simply say, Dawn, you were skewness and not good for purposes.

09:08.740 --> 09:18.220
So here you can see that the skewness is one point nine six and the closest is three point seven two,

09:18.550 --> 09:22.450
which simply shows that there is skewness in the data.

09:24.110 --> 09:29.430
Now, because the causes is greater than three.

09:29.870 --> 09:38.720
We can simply say that this is left to go dig in nature that is which is clearly visible, that it is

09:38.720 --> 09:52.410
having a very slim peak, and the values and the edge in the fields are also left in the.

09:56.210 --> 10:02.630
So this is how you can simply find out, I mean, we didn't move minimax and everything that you want

10:03.140 --> 10:04.520
using Python.

10:04.820 --> 10:07.610
And just by using a single function.

10:07.820 --> 10:13.940
So this is not this overly complicated python makes it very simple for you to solving this.

10:15.240 --> 10:21.350
Now, there are a few things which we need to solve, whether we will be solving a few more questions

10:21.620 --> 10:28.310
so that you can learn how we can deal with data, with just presenting different forms to us.

10:28.610 --> 10:32.600
We might not always have data, which is as simple as this.

10:33.080 --> 10:33.560
Yes.

10:33.980 --> 10:42.020
There are a large number of rules when there are in sort of 20, if there are even one million or 100

10:42.020 --> 10:43.850
million, no matter what the number is.

10:44.090 --> 10:50.330
You can use the same function and find out the mean million more than all these enderlin lenses.

10:50.690 --> 10:55.550
But the only difference is if the data is given in some other way.

10:55.580 --> 11:01.250
So we need to learn how to handle the day that is given in some other way, no matter how large it is.

11:01.880 --> 11:04.460
It will still be as simple as this.

11:05.480 --> 11:11.870
So let's go ahead and have a look at another type of questions so that you can get what I'm trying to

11:11.870 --> 11:12.200
see.

11:14.750 --> 11:17.450
Now, let us solve this particular question.

11:17.750 --> 11:24.140
Now, if you see the difference between what the questions here, we had all the scores in hand.

11:24.260 --> 11:25.760
These are straightforward.

11:26.120 --> 11:27.760
We have the scores directly.

11:27.770 --> 11:32.270
So if we want to calculate the mean, it is simple in nature, we will simply find the sum of all the

11:32.270 --> 11:36.710
values and found of all the values and lower division for them.

11:37.550 --> 11:44.660
But then we have to find out the mean that we have scored and the frequency.

11:45.080 --> 11:47.420
Then it will be slightly different.

11:48.170 --> 11:58.580
You can see that the scores are not just 10, 15, 20, 17, but there are five students who have scored

11:58.580 --> 12:04.850
them, 20 for students who have scored 15, 20 children to have scored 20.

12:05.360 --> 12:07.940
So we need to handle it in a different way.

12:08.390 --> 12:15.050
Now, I give you some time, you can pause the video and try to solve this, I think, around how we

12:15.050 --> 12:16.730
can solve this kind of question.

12:16.730 --> 12:17.250
We have here.

12:17.270 --> 12:19.460
We have the score and the frequency.

12:19.790 --> 12:27.110
So the number of data points, which we have, the number of rows which we have is not actually the

12:28.430 --> 12:33.970
number of students which we have, but the number of students are given separately.

12:33.980 --> 12:40.610
So the number of students will be five plus twenty four plus 12 plus 12 plus extra 16 plus twenty one

12:41.000 --> 12:42.530
plus sixty six platform.

12:43.130 --> 12:50.270
So this is the number of students and the score for each student will be, say, five students will

12:50.270 --> 12:57.410
have been 24 students will have scored 15 to Astrid's will have scored 20 and so on.

12:58.760 --> 13:01.370
So let's pause the video for some time.

13:01.370 --> 13:03.080
Try solving it at your own.

13:04.120 --> 13:08.570
Think around it and then you can pause the video for resolution.

13:11.840 --> 13:13.520
Now, let us solve this problem.

13:14.120 --> 13:22.700
Now, when we want to see how many store how much force rooms have got, the total score for these students

13:22.700 --> 13:26.690
will be there will be five students who will have scored then.

13:27.020 --> 13:29.960
So the score for these five students will be 50.

13:30.800 --> 13:36.320
Then for 24 students who have scored 15, the total score would be 360.

13:36.950 --> 13:41.240
For 12 students who have scored 20, the total score would be 240.

13:41.480 --> 13:47.780
So this is basically then multiplied by five fifteen, multiplied by twenty four, twenty multiplied

13:47.780 --> 13:48.280
by 20.

13:49.070 --> 13:56.220
So this gives us the scores for each of basically the combined score for all the.

13:57.260 --> 14:04.610
Now we can take the sum of all of these values, which are product of school frequency, to get the

14:04.630 --> 14:10.160
total school, to get the combined score for all the students that we have.

14:11.210 --> 14:18.050
Now, what is the total number of students that we have for the number of students will be five plus

14:18.050 --> 14:24.890
24, last 12 plus twenty one plus twenty twelve plus explosive, six plus four.

14:25.550 --> 14:30.290
So if we add all of these values, you can see this is the cumulative frequency.

14:30.560 --> 14:36.500
So for five it is five, five plus twenty four comes out to be twenty nine five plus twenty four plus

14:36.500 --> 14:38.450
twelve comes out to be forty one.

14:38.900 --> 14:40.890
Some of these is sixty two.

14:40.920 --> 14:43.820
Some of then twelve is seventy four times one.

14:44.060 --> 14:47.720
So the total number of students that we have is one sixty eight.

14:51.050 --> 14:55.060
And the sum of all the scores will be sum of all.

14:55.150 --> 14:59.290
These numbers, which we have yet 50 plus 360 plus to 40 in all of this.

15:01.390 --> 15:10.840
So if we find out the mean, the mean will be some of all the scores divided by the number of students.

15:11.260 --> 15:17.020
So some of all schools will be what it will be, the sum of scores multiplied by the frequency.

15:17.440 --> 15:20.380
So 50 plus 360 plus 240.

15:20.390 --> 15:26.240
This some of all of these values will be four, two, three, nine, divided by the number of students.

15:26.320 --> 15:28.810
What is the number of students, the number of student?

15:28.810 --> 15:33.550
Does the sum of all the frequencies which comes out will be one sixty eight.

15:33.820 --> 15:39.820
So the means for all of these students is actually twenty five point two three.

15:40.720 --> 15:43.810
So this is the means for all of these students.

15:48.690 --> 15:56.310
Now, if you want to find out the median score, then the median score will be a lot different.

15:57.030 --> 15:58.800
How do we find out the median scored?

15:58.800 --> 16:07.080
The median score will be the score, the middle most scored of these scores, which we have now, the

16:07.080 --> 16:08.130
middle score.

16:08.160 --> 16:13.980
You cannot find out directly from here, but you will have to find out the score of the middle, most

16:14.160 --> 16:16.470
stringent in the sorted order.

16:16.770 --> 16:19.680
So it will be half of one sixty eight.

16:20.340 --> 16:22.230
Now, what is half of 168?

16:22.230 --> 16:27.000
Half of 168, is it before 168?

16:27.000 --> 16:28.440
Divided by two is 84.

16:28.770 --> 16:34.440
Now, where do we get the cumulative frequency as a before it is here?

16:37.020 --> 16:42.280
I scored 27, so I scored 27.

16:42.300 --> 16:46.140
The cumulative frequency is around 80, 84.

16:46.320 --> 16:51.720
So the median score is 27.

16:52.080 --> 16:55.170
While the mean is twenty five.

16:55.260 --> 17:02.460
So you can clearly see how the values are spread out, how it is very different.

17:02.460 --> 17:09.120
If you consider me and median, if you consider median, you get the middle most value.

17:09.390 --> 17:16.260
While when you're trying to find out the mean, you get the value, which is an average value of value

17:16.500 --> 17:24.360
is of the words, though, if is not impacted, which is actually impacted by the outliers.

17:27.910 --> 17:35.980
So let's go back now, let's solve a little question here now considered here this question where we

17:35.980 --> 17:39.670
have the asymptotes instead of the score itself.

17:40.120 --> 17:46.300
So here I had the exact score, which is the mean given distribution.

17:46.300 --> 17:49.750
So we have then 15 dwin like this.

17:49.750 --> 17:50.700
We have the numbers.

17:50.710 --> 17:57.150
But now what we have is we have these mean of we have the class interval.

17:57.160 --> 18:05.020
So we don't know the exact scores of these students, but we do know that the scores are ranging between

18:05.020 --> 18:08.590
zero and eight, eight and 16, 16 and when differential.

18:09.130 --> 18:10.920
So we have a class in Berkeley.

18:11.300 --> 18:16.150
Now, when we have class interval, the calculation somewhat remains the same.

18:16.600 --> 18:21.430
But you will have to make slight changes to the approach which you are following.

18:21.880 --> 18:24.130
Then we are given class interval.

18:24.580 --> 18:29.410
You will not be given the class interval mean, but you will have to calculate the class interval.

18:30.370 --> 18:34.930
So I have calculated the class interval mean for you in advance.

18:35.350 --> 18:38.560
But now it is completely up to you how you solve this question.

18:39.190 --> 18:46.420
So it is somewhat in the same terms of the last question, but you just need to get the understanding

18:46.420 --> 18:48.190
what we are doing and why they're are doing it.

18:48.490 --> 18:52.870
So just try this particular question and try to find out the mean value for this.

18:53.440 --> 18:58.570
You can also find out the standard deviation and variance and all of the matrices.

18:58.870 --> 18:59.230
Right.

18:59.440 --> 19:02.260
That is an additional practice which you can do.

19:02.440 --> 19:06.040
But the main question here is, if you want to find out the mean, how would you do that?

19:06.310 --> 19:11.650
So you can pause the video and then find out the values you can unbossed to see the solution.

19:13.600 --> 19:15.970
Now, let us look at the solution.

19:16.330 --> 19:24.820
So here I have copied the class interval and the respective values for, you know, we just converted

19:24.820 --> 19:25.570
this into.

19:31.580 --> 19:37.760
So I have converted this into the glass in Deauville form, and I have gone through the glass in Deauville

19:37.760 --> 19:38.450
mean for you.

19:38.960 --> 19:40.700
So this is the interval mean.

19:41.090 --> 19:47.480
So when we were talking about the propria problem there, we had the exact course.

19:47.840 --> 19:50.270
We were calculating based on the exact spot.

19:52.400 --> 19:54.290
So we have the most value.

19:54.560 --> 19:59.540
Now, this model, most value is just the mean of the glass interval.

20:00.020 --> 20:05.660
So this particular value can be the same as a score of which we had earlier.

20:05.900 --> 20:10.760
So in sort of taking the entire value in one value at a time, we're just simply saying that there were

20:10.760 --> 20:16.130
five students who have the score, which is having the mean value of food.

20:16.130 --> 20:21.040
And then we are doing the same calculation which we were doing in the case.

20:21.380 --> 20:23.360
So we just simply find out the product of this.

20:24.080 --> 20:27.200
So this will be this.

20:37.160 --> 20:38.840
So we do the same here.

20:39.590 --> 20:40.850
Now, this is the product.

20:41.210 --> 20:45.650
Now we we need to find out the some of the frequency.

20:45.680 --> 20:49.010
So we need to know the total number of students that we have.

20:49.490 --> 20:50.900
So it will be equal to some

20:58.430 --> 20:59.300
of these.

21:02.090 --> 21:05.480
So basically, these are the total number of students that we have.

21:06.170 --> 21:09.230
And these are these two are a total of the scores.

21:10.130 --> 21:13.730
But frequency and when we add these up, we get the

21:16.790 --> 21:19.310
total score for fifty nine students.

21:19.670 --> 21:28.820
And if you want to find out the mean, we can simplify it down by dividing the sum by the sum of the

21:28.820 --> 21:29.630
frequency.

21:31.190 --> 21:39.290
So what did we do, B, for each class in Doval, we multiplied it with the frequency to get the score

21:39.290 --> 21:41.330
for that particular frequency.

21:41.330 --> 21:46.250
That is five students high school as four.

21:46.340 --> 21:51.860
So we multiplied it to get the score for our total school for that five students.

21:52.130 --> 21:55.340
And similarly, for 10 students, the class interval was 12.

21:55.640 --> 22:00.380
So these 10 school student scored on an average around 12.

22:00.680 --> 22:04.970
So that is why we say 120 is the total score for the student and so on.

22:05.270 --> 22:08.540
So this is the score, but the frequency.

22:10.250 --> 22:13.750
And when we are these up, we get one, four, six, eight.

22:13.760 --> 22:16.880
That is a total school for all these orders.

22:17.270 --> 22:22.730
And these all students are nothing but fifty nine in number, which we got by adding the frequencies

22:22.730 --> 22:23.240
together.

22:23.720 --> 22:30.380
And when we divide the total score by total number of students, which is the normal formula for me,

22:30.810 --> 22:37.130
the guilty by means court for this particular class.

22:38.420 --> 22:41.810
So this is how we solve that particular problem.

22:42.470 --> 22:49.230
In case you faced similar kind of problem, in case of data frames, or even if you have a huge data

22:49.310 --> 22:58.910
there, you have this kind of problem, you can simply find out the frequency and use similar approach

22:59.210 --> 23:00.500
on your data frames.

23:01.130 --> 23:04.250
Let's go ahead and learn inferential statistics.

23:04.550 --> 23:11.230
And once we are going through the influential statistics, we will pick up different problems.

23:11.240 --> 23:12.980
We will pick up different data sets.

23:13.310 --> 23:20.480
And you can apply these methods on that particular dataset to find out the mean median, more outages

23:20.480 --> 23:25.250
and all the values so that you get a handle on how you solve these problems.

23:25.850 --> 23:31.010
So the problems will be slightly more difficult in case of inflation statistics.

23:31.310 --> 23:39.500
So that is why it is important to have a clear knowledge of descriptive statistics so that you can understand

23:39.500 --> 23:41.870
what is going on in influential, sarcastic.
