WEBVTT

00:00.480 --> 00:00.780
Hi.

00:01.170 --> 00:09.900
So now we will have a look at your depression project solution, so this particular solution I will

00:09.900 --> 00:19.800
be implementing in CoLab because the data set is pretty large and then we are using Google Kulab.

00:19.800 --> 00:25.830
It will help us run this code a little faster and there is a little chance of crashing and all of the

00:25.830 --> 00:26.850
problems that you face.

00:27.360 --> 00:29.760
So we will be using Google Kulab for this.

00:31.050 --> 00:35.760
Now, a few tips for starting with Google collabs.

00:36.060 --> 00:42.690
You will basically have to open your Google Drive and inside the wheel drive itself.

00:42.690 --> 00:46.110
What you can do is let us say we have this Google Drive.

00:46.380 --> 00:53.220
I can simply say more and inside more.

00:53.220 --> 00:55.370
That is option for Google collaboratory.

00:55.650 --> 01:00.810
So when I open collaboratory, it will be showing me a blank sheet.

01:01.140 --> 01:05.410
And when I see this blank sheet, I can start walking on barbarically.

01:06.120 --> 01:08.280
So here I have this blank sheet.

01:08.280 --> 01:12.030
If I do this, there will be more line of cords.

01:12.030 --> 01:16.890
If I do this, there will be more markdowns which will be added.

01:17.310 --> 01:21.250
Then here we have option for connecting, connecting with the colors.

01:21.300 --> 01:23.760
When we click on this, there are two options.

01:24.090 --> 01:26.490
One is Connector Holsted runtime.

01:26.490 --> 01:37.290
On the other one is going to look one thing so you can use local runtime when you are not really working

01:37.620 --> 01:38.700
on something big.

01:38.940 --> 01:44.700
But if you're working on a larger data set or when you want to have faster processing, you can use

01:44.790 --> 01:45.710
those different thing.

01:46.110 --> 01:48.480
And here you have options for monetization.

01:48.490 --> 01:53.970
So when I click on this, it shows me different kind of sessions which are open and then you can decide

01:53.970 --> 01:56.490
if you want to close a session or something like that.

01:57.810 --> 02:00.510
Next thing which we have here is, again, in the done thing.

02:00.900 --> 02:07.020
We have this a change runtime option, which allows me to have these three options.

02:07.020 --> 02:10.050
One is none under the one you wanted of one is to use.

02:10.770 --> 02:14.850
Nine is when you are not using any hardware accelerator.

02:14.850 --> 02:17.820
That is, you don't want to foster anything.

02:17.820 --> 02:23.040
You want to use it just like you use your normal laptop or your computers.

02:23.400 --> 02:26.610
Then you want a little faster processing.

02:26.610 --> 02:31.230
You can use abuse and abuse goes on.

02:31.230 --> 02:34.470
The abuse do not have much difference.

02:34.890 --> 02:42.110
But when you are working on deep learning or when you're working with dancers at that point of time,

02:42.120 --> 02:47.010
the abuse would perform better because they are working on dancers.

02:47.020 --> 02:51.840
And so that is when you can use Depuis otherwise working with your workspace.

02:53.820 --> 02:59.100
So I will be using none for now and I click save and then it will connect for me.

02:59.460 --> 03:01.320
I just click connect and it will connect.

03:01.530 --> 03:03.570
So now going back to our solution

03:07.110 --> 03:09.920
now here we have this code.

03:09.930 --> 03:17.840
So I the first thing that you will be doing is running this particular command which will mount your

03:17.850 --> 03:18.560
whole bedrick.

03:18.960 --> 03:22.410
So basically when I run this, what happens is I did run this for you.

03:35.380 --> 03:39.050
It will ask me for this particular warrant.

03:39.070 --> 03:45.490
So when I click this particular, you are eligible, ask me to select one particular email, idy or

03:45.490 --> 03:48.670
one particular drive from which I want to retrieve the data.

03:49.900 --> 03:56.470
So once I choose my email I.D., it will give me this particular option where I can see a lull and after

03:56.470 --> 04:01.270
I see allow it will give me this particular code, which I can simply copy by clicking on this.

04:01.690 --> 04:04.180
And I will go back and then I go back.

04:04.180 --> 04:07.690
I enter this phase out of here and then I enter this.

04:07.690 --> 04:11.480
It will authorize this and connect with the Google Drive.

04:12.400 --> 04:19.390
So once this will drive this connected, if you click on this Fi's option, you can see the entire drive

04:19.390 --> 04:19.930
structure.

04:19.960 --> 04:25.060
So if I click on this, I be able to see all the different files which are present in my drive and all

04:25.060 --> 04:25.780
those details.

04:28.120 --> 04:30.700
So this is what I will be having.

04:30.730 --> 04:39.340
So what I can do is I can simply go to this particular run by different libraries, which I want to

04:39.340 --> 04:39.970
import.

04:40.420 --> 04:44.380
Then I can provide the CSP details.

04:44.560 --> 04:48.260
So I have to provide the parts for the CSP file.

04:48.520 --> 04:51.660
So let me show you how I got this particular box.

04:53.860 --> 05:01.390
So I just went inside this particular part in the files and I think here and then I'll write click and

05:01.390 --> 05:04.240
I copy both and it will give me the part for this.

05:04.810 --> 05:07.260
That's all you need to do to import any fight.

05:08.170 --> 05:09.160
After that.

05:09.160 --> 05:12.100
I have imported this data set.

05:12.110 --> 05:20.530
So this data set contains different ideas, the hazard details, which is the numeric value, which

05:20.530 --> 05:30.170
isn't in Dederer value, which is a whole number, which is not a floating point number, basically.

05:30.910 --> 05:34.960
Then we have different variables which we have to keep off the book.

05:35.740 --> 05:40.480
Now out of these variables, if I see the shape there, around 40000 rows of data.

05:40.480 --> 05:46.180
Thirty four columns, I will find out the target column as Hassard.

05:46.420 --> 05:49.750
And this idea does not make really sense to me.

05:49.750 --> 05:53.980
It doesn't give much information so I can really get rid of this column.

05:55.240 --> 05:58.690
Next is the ANOVA one we do with the 50.

05:58.990 --> 06:04.080
I will have to take care of these columns and find out which are actually important and which are not.

06:04.390 --> 06:07.290
So I will get the column five.

06:07.300 --> 06:13.660
So if we see the airport select detail and find out the object type, you can see that these all are

06:14.650 --> 06:20.850
categorized as Object B, so I will find out the cutoff value.

06:20.860 --> 06:24.210
So here I am sitting because of value to be five percent.

06:24.730 --> 06:32.860
So whatever value is having five percent, less than five percent frequency of any information internally,

06:33.220 --> 06:37.570
I will simply get rid of those particular categories.

06:37.570 --> 06:42.550
I convert my category categorical columns into dummy variables.

06:44.020 --> 06:47.740
So this is the code which we have been using for ever.

06:48.070 --> 06:55.240
So it's simply the ject me, the categories which have frequency more than the cutoff value, which

06:55.240 --> 06:58.140
is more than five percent in any particular column.

06:58.510 --> 07:08.110
And then I get these different categories and it converts these categorical variables into dummy columns.

07:08.180 --> 07:10.980
So these are the columns which have been generated.

07:11.350 --> 07:18.460
Now next thing which we will be doing is I have just included define this particular report function,

07:18.460 --> 07:20.830
which will give me the report next.

07:20.840 --> 07:27.550
What I do is I create this VYE data frame, which contains my target and state of name, which contains

07:27.550 --> 07:36.910
my features or attributes next time reporting different libraries for my model building like Bensedrine

07:36.910 --> 07:37.990
split randomise.

07:38.000 --> 07:45.640
So it's the only decision created this linear regression, which all is the next thing which I do is

07:45.640 --> 07:51.880
I am simply implementing the ridge regression here.

07:51.880 --> 07:59.920
Ridge regression is giving me all these different runs and later, towards the end of.

08:03.790 --> 08:09.550
I can see this information which has been retrieved, and the best estimate that comes out to be the

08:09.550 --> 08:12.400
one with Elfy is equal to one hundred makes.

08:12.400 --> 08:19.290
What I do is I can find out the mean validation school, which is minus two point seven seven five.

08:19.600 --> 08:27.700
Now, at the beginning of this particular project, we had a guy claim that the scoring will be done

08:27.880 --> 08:36.220
based on a particular formula, which you must have noted down, which is one minus the mean validation

08:36.230 --> 08:36.850
score.

08:36.850 --> 08:43.320
That is the error of divided by a five point four.

08:43.330 --> 08:44.890
So that is what we are looking for.

08:44.890 --> 08:49.590
We are looking for a value greater than five point one there.

08:50.050 --> 08:55.240
So these are the different results, which are not really satisfactory enough for us.

08:55.660 --> 09:05.230
So we will go ahead and we will load the predictions versus the data we want, which is one of the variables

09:05.230 --> 09:06.170
which are present here.

09:06.190 --> 09:14.040
So if you see the believin is scattered like these, I'm the prediction, which we have our prison here.

09:14.710 --> 09:22.970
So you can see immediately it is not able to find those these values which are scattered about.

09:24.400 --> 09:32.510
Similarly, that we will go further and implement the decision created this year and inside this decision

09:32.570 --> 09:39.790
to said again, we are finding out they may not be negative, mean, absolute error, and we have implemented

09:39.790 --> 09:41.080
this entire thing.

09:41.080 --> 09:46.690
And again, this is the main validation score which we obtained out of it.

09:47.530 --> 09:49.770
And these are the predictions.

09:49.790 --> 09:53.000
So now you can see this is already better than the previous one.

09:54.100 --> 09:58.290
The next thing which we do is we are implementing the random for this trip this year.

09:58.870 --> 10:06.690
So as we go further, it does find something relevant here.

10:24.740 --> 10:32.750
So here we are using the land of what is vessel and we are using negative, mean absolute data and again,

10:33.170 --> 10:40.790
it has this particular desert, which it gives us out of it, and they show the desert to.

10:43.640 --> 10:45.130
This is the Red Crescent.

10:45.160 --> 10:53.660
This is the other details, and now here we are implementing the most aggressive with the objective,

10:53.870 --> 10:55.100
as we saw.

10:55.410 --> 11:02.630
Now, this is what will actually help us achieve our target, because earlier what we were doing was

11:02.630 --> 11:06.320
while we were targeting, we were finding out cantinas values.

11:06.320 --> 11:14.290
But when we use conquest, only me give us the values which are not in a floating point of format.

11:14.600 --> 11:15.910
So we'll go further.

11:15.920 --> 11:19.730
And here you can see that the result is minus two point seven to six.

11:20.300 --> 11:27.020
And as we go further, you can see the results have further improved.

11:27.020 --> 11:29.780
Move now.

11:29.840 --> 11:34.330
Next, we will go on doing the barometer here.

11:34.340 --> 11:42.340
If you would have noticed one thing they are doing the sequentially cleaning, which is we asked for

11:42.350 --> 11:49.510
in the first run, the fix the estimate when the object, all of the barometers are commented here,

11:50.120 --> 11:52.060
very ladies are dad.

11:52.070 --> 11:56.930
It will consider these values to be the default values for the body which are already present.

11:57.590 --> 12:02.790
So when we wrap this, we are trying to find out the estimated value when hit on this.

12:02.790 --> 12:06.340
Here I find out the estimated value 500 folks built for me.

12:06.890 --> 12:08.000
So I go ahead.

12:09.320 --> 12:19.730
And in the next one I put in estimate Demetri's 250 and further, I try to find out different learning

12:19.760 --> 12:22.820
rates for myself now, different learning rate.

12:22.830 --> 12:28.220
So I have provided all these learning rates and I'm running this training, this model again.

12:28.940 --> 12:40.340
When I put this model and I see the results, it allows me to find out of more better values.

12:40.400 --> 12:48.760
And I'm able to find out that learning rate, which is doing well for me, is zero point zero zero five.

12:49.610 --> 12:51.370
So I will fix this again.

12:51.890 --> 12:57.470
Now, I fixed my loan exactly zero point zero zero five and the next best one, that's zero point zero

12:57.470 --> 12:59.610
zero two to there.

12:59.620 --> 13:01.250
Also, I have all fixed.

13:01.700 --> 13:06.990
Now, what I will do is I am training this to find out the best muxtape.

13:07.760 --> 13:13.000
So now again, I will fit the mold of the model here.

13:13.550 --> 13:19.370
We get to know that this is the desired for us.

13:21.170 --> 13:28.610
So we go further and you can see that the learning tool is performing well in maximum depth.

13:28.760 --> 13:29.830
Seven performance.

13:30.620 --> 13:31.550
So we go ahead.

13:32.990 --> 13:38.590
We put the maximum that the seven learning theta zero point zero zero two and estimated about 500.

13:38.600 --> 13:43.210
These are the values which we have fixed and subsample is the values which we are trying to find out

13:43.220 --> 13:43.410
now.

13:43.640 --> 13:50.480
Again, we will do the same thing and I predict we will improve our model and fix the parameters.

13:50.910 --> 13:57.240
Now, doing this is useful only if your model performance also improves.

13:57.980 --> 14:03.520
So if my modeling performance is not improving, so this value is minus two point sixty six.

14:04.490 --> 14:15.080
If I go for the standard deviation of zero point three four nine five, if I go above here, we have

14:15.560 --> 14:18.000
minus two point six to seven.

14:18.590 --> 14:21.080
So here you can see there is a slight improvement.

14:21.560 --> 14:27.950
But I will keep on going only if this there is a good number, good amount of improvement.

14:28.190 --> 14:30.760
It is improving by zero point one percent.

14:31.580 --> 14:33.550
So that is not much improvement.

14:33.560 --> 14:33.830
Right.

14:34.070 --> 14:37.010
So we are not looking for that small amount of improvement.

14:37.010 --> 14:38.480
We are looking for a large and improvement.

14:38.510 --> 14:41.630
So anybody from this would be fine for us.

14:42.860 --> 14:47.300
Now, what you can do is you can create different models.

14:47.300 --> 14:51.050
So this is just one of the examples I used.

14:51.300 --> 14:58.190
You can start different models and you use count for something that I don't see how you can improve

14:58.190 --> 15:05.420
that you can create a combination of different models and try implementing those.
