WEBVTT

00:00.450 --> 00:02.550
Instructor: Hello and welcome to this tutorial.

00:02.550 --> 00:03.383
All right,

00:03.383 --> 00:05.670
so we are gonna start with the first file of our model

00:05.670 --> 00:09.150
and the most important file, that's Model P. Y.

00:09.150 --> 00:11.490
And this is in this file that we will implement

00:11.490 --> 00:13.290
the brain of the whole model.

00:13.290 --> 00:15.960
You know, the brain at the heart of the A3C model.

00:15.960 --> 00:18.960
So that's in this file that we will make the neural network

00:18.960 --> 00:20.640
which will of course contain some

00:20.640 --> 00:21.960
convolution neural networks

00:21.960 --> 00:24.060
because of course we're still doing some

00:24.060 --> 00:25.650
deep reinforcement learning.

00:25.650 --> 00:27.840
So our AI will still have eyes.

00:27.840 --> 00:29.130
And inside this neural network,

00:29.130 --> 00:30.630
we will integrate everything

00:30.630 --> 00:33.360
that is related to the Active Create Mode.

00:33.360 --> 00:34.590
And there is a bonus,

00:34.590 --> 00:35.423
as I told you,

00:35.423 --> 00:38.190
we are implementing one of the most powerful A3C models.

00:38.190 --> 00:39.540
And what makes it that powerful

00:39.540 --> 00:42.180
is that it will contain a Rec Renewal Network

00:42.180 --> 00:45.510
and more precisely an LSTM, long short term memory,

00:45.510 --> 00:46.650
so that we can learn the

00:46.650 --> 00:49.290
temporal properties of what's going on in the game.

00:49.290 --> 00:50.280
That is actually the

00:50.280 --> 00:52.140
temporal properties of the input

00:52.140 --> 00:54.537
so that the predictions can be even better.

00:54.537 --> 00:56.070
So there we go.

00:56.070 --> 00:58.620
We are implementing a very powerful model

00:58.620 --> 01:00.780
that combines basically all the neural networks

01:00.780 --> 01:02.640
that we saw in the deep learning course

01:02.640 --> 01:04.410
that is a Artificial Renewal Network,

01:04.410 --> 01:06.180
a Convolution Neural Network,

01:06.180 --> 01:08.220
and a Record Renewal Network.

01:08.220 --> 01:09.960
And at the heart of all these networks,

01:09.960 --> 01:11.820
there is of course the A3C model

01:11.820 --> 01:14.370
that will make the AI very powerful.

01:14.370 --> 01:15.210
So let's do this.

01:15.210 --> 01:18.420
Let's attack this model and implement it.

01:18.420 --> 01:21.810
So we're gonna start by making two functions,

01:21.810 --> 01:23.910
that are just some functions that will take care

01:23.910 --> 01:25.920
of how we can initialize the weights

01:25.920 --> 01:27.660
because you know we're gonna have some neural networks

01:27.660 --> 01:29.310
and therefore we're gonna have weights.

01:29.310 --> 01:31.740
And we just want to make these two functions first

01:31.740 --> 01:33.870
so that we already have a tool

01:33.870 --> 01:37.110
to integrate very easily inside the whole model,

01:37.110 --> 01:38.280
the neural network.

01:38.280 --> 01:40.500
So these two functions are gonna be

01:40.500 --> 01:42.660
normalized columns initializer,

01:42.660 --> 01:45.150
that is basically a function that can,

01:45.150 --> 01:47.010
not only initialize some weights,

01:47.010 --> 01:50.370
but set a specific variance of a tensor of weights.

01:50.370 --> 01:52.980
So that's exactly what we're about to implement right now.

01:52.980 --> 01:55.140
And then we will implement a second function

01:55.140 --> 01:57.210
which will be the weights in it function

01:57.210 --> 01:59.520
and that will basically initialize the weights

01:59.520 --> 02:01.386
in an optimal way for the learning.

02:01.386 --> 02:02.340
All right?

02:02.340 --> 02:04.950
And then, once we're done with these two functions

02:04.950 --> 02:08.310
we will start implementing the neural network.

02:08.310 --> 02:09.143
So let's do it.

02:09.143 --> 02:12.090
Let's quickly make these two functions.

02:12.090 --> 02:14.400
So I'm starting with a deaf here.

02:14.400 --> 02:16.080
Then I'm gonna give the name of these function

02:16.080 --> 02:19.447
which is Normalized_Columns_Initializer.

02:24.690 --> 02:25.620
There we go.

02:25.620 --> 02:28.950
And this function is gonna take just two inputs.

02:28.950 --> 02:33.000
First, it's going to be the weights we want to initialize

02:33.000 --> 02:35.130
and the standard deviation,

02:35.130 --> 02:36.630
because as I just said

02:36.630 --> 02:38.670
we want to set a specific variance

02:38.670 --> 02:40.200
for our tensor of weights.

02:40.200 --> 02:42.355
And if you want to understand why we have to do this,

02:42.355 --> 02:43.410
it's because you know,

02:43.410 --> 02:45.066
when we make the neural network

02:45.066 --> 02:47.460
there will be the actor and the critic

02:47.460 --> 02:49.320
according to the A3C model,

02:49.320 --> 02:51.660
and we will make two separate fully connected layers,

02:51.660 --> 02:53.880
one for the actor and one for the critic.

02:53.880 --> 02:56.880
And these two fully connected layers will have weights,

02:56.880 --> 02:59.460
and we will set a standard deviation

02:59.460 --> 03:01.830
for each of these two groups of weights.

03:01.830 --> 03:03.000
And so what we'll do is,

03:03.000 --> 03:05.670
we will set a small standard deviation for the actor.

03:05.670 --> 03:07.680
It'll be around 0.01

03:07.680 --> 03:10.530
and a big standard deviation for the critic,

03:10.530 --> 03:12.720
which will be around 1, I think.

03:12.720 --> 03:15.000
So that's why we're making this function

03:15.000 --> 03:18.420
so that we can very easily set the standard deviation

03:18.420 --> 03:20.580
for the weights we will initialize later

03:20.580 --> 03:21.900
for the actor and the critic.

03:21.900 --> 03:23.520
That's why we're doing this.

03:23.520 --> 03:26.280
So now we are just going to set a default value,

03:26.280 --> 03:27.960
but this will change afterwards

03:27.960 --> 03:29.490
when we initialize the weights.

03:29.490 --> 03:32.250
So let's choose so far, 1.0.

03:32.250 --> 03:33.083
All right.

03:33.083 --> 03:37.380
And now we're ready to define what's inside this function.

03:37.380 --> 03:40.236
So what we'll first prepare is the output,

03:40.236 --> 03:41.970
that we're gonna call Out.

03:41.970 --> 03:43.620
So this Out variable

03:43.620 --> 03:46.320
is what will be returned by this function.

03:46.320 --> 03:50.310
And so Out, first what we're gonna do is initialize it.

03:50.310 --> 03:51.420
So as you understood,

03:51.420 --> 03:54.360
this output will be a tensive weight

03:54.360 --> 03:56.820
that will have a specific standard deviation.

03:56.820 --> 03:59.850
But before we take care of setting the standard deviation

03:59.850 --> 04:01.410
we just want to initialize it,

04:01.410 --> 04:03.630
and then we will set the standard deviation here.

04:03.630 --> 04:06.840
Which is the argument, which is the input of this function.

04:06.840 --> 04:09.600
So Out and to initialize a tensor of weights,

04:09.600 --> 04:12.150
you might know how to do it, we already did it.

04:12.150 --> 04:15.210
We're gonna use our Torch Library.

04:15.210 --> 04:16.527
And from this Torch Library

04:16.527 --> 04:20.498
we will take the RAND N function

04:20.498 --> 04:23.700
which will initialize a Torch tensor

04:23.700 --> 04:27.510
with random weights that follow a normal distribution.

04:27.510 --> 04:31.200
So that's why it is called RAND N, N is for normal.

04:31.200 --> 04:33.030
And so now what we simply need to input

04:33.030 --> 04:36.480
is the number of elements that this tensor will contain.

04:36.480 --> 04:38.220
And this number of elements is of course

04:38.220 --> 04:39.240
the number of weights

04:39.240 --> 04:41.580
because we're actually initializing

04:41.580 --> 04:43.350
a tensor for these weights here.

04:43.350 --> 04:45.630
And so to get this number of elements,

04:45.630 --> 04:47.470
we can simply take our weights

04:48.660 --> 04:53.400
and add dot to get size with parenthesis.

04:53.400 --> 04:56.940
And this will give the number of elements in weights

04:56.940 --> 04:59.430
so that it'll create a Torch tensor

04:59.430 --> 05:02.400
of the same number of elements of our weights.

05:02.400 --> 05:04.470
And it'll be initialized with random weights

05:04.470 --> 05:06.870
following normal distribution.

05:06.870 --> 05:07.703
All right?

05:07.703 --> 05:09.060
And now it is time to

05:09.060 --> 05:11.490
set the standard deviation we want to have.

05:11.490 --> 05:13.500
That is this standard deviation here.

05:13.500 --> 05:16.920
So what we're gonna do now is a simple normalization.

05:16.920 --> 05:19.650
We have a Torch tensor of weight,

05:19.650 --> 05:21.630
and now we want to normalize it.

05:21.630 --> 05:22.463
And so to normalize it,

05:22.463 --> 05:25.830
we will simply write the explicit computation.

05:25.830 --> 05:27.990
And so what we simply need to do here

05:27.990 --> 05:30.540
is take our output

05:30.540 --> 05:33.810
then update it by multiplying it

05:33.810 --> 05:36.534
by the standard deviation we want to have,

05:36.534 --> 05:41.130
divided by this sum I've just mentioned.

05:41.130 --> 05:42.150
And so to get the sum,

05:42.150 --> 05:45.000
we're gonna use the square root function by Torch.

05:45.000 --> 05:48.810
And so that's why I'm taking here Torch.SQRT

05:48.810 --> 05:50.820
that's the square root function.

05:50.820 --> 05:53.220
And inside we are going to input

05:53.220 --> 05:56.160
the square root sum of the weights of our vector.

05:56.160 --> 05:58.200
And so we take our output,

05:58.200 --> 06:01.140
then we use the Power Function

06:01.140 --> 06:02.760
to which we input 2

06:02.760 --> 06:05.880
because we want to take the square of the sum,

06:05.880 --> 06:08.546
and then we take therefore the sum.

06:08.546 --> 06:13.530
And inside we are going to specify the index

06:13.530 --> 06:16.890
of the column that contains the weight we want to sum

06:16.890 --> 06:20.400
and then to get these weights separately,

06:20.400 --> 06:22.500
because we want to sum them,

06:22.500 --> 06:25.140
while we use the expand underscore

06:25.140 --> 06:29.520
as function of our output, Out.

06:29.520 --> 06:32.760
All right, so this will get the weights of Out

06:32.760 --> 06:36.660
which so far was initialized as a Torch tensor of weights.

06:36.660 --> 06:38.310
So that gets all these weights.

06:38.310 --> 06:39.870
We take the sum of square

06:39.870 --> 06:43.980
and then we take the square root to apply the normalization.

06:43.980 --> 06:46.650
And the fact that we have this standard deviation

06:46.650 --> 06:51.256
and numerator will make sure that can write it here.

06:51.256 --> 06:55.110
Variants of Out will be equal

06:55.110 --> 06:58.890
to square of the standard deviation.

06:58.890 --> 07:01.710
This formula here will make sure

07:01.710 --> 07:04.465
that this Tensor of weights that we initialized

07:04.465 --> 07:06.900
will have a variance that will be equal to

07:06.900 --> 07:08.790
the square of the standard deviation

07:08.790 --> 07:11.160
that we input as argument.

07:11.160 --> 07:15.270
And that is how we can set a specific standard deviation

07:15.270 --> 07:18.810
for the future actor and critic that we will make soon.

07:18.810 --> 07:20.280
And we will choose a small

07:20.280 --> 07:22.680
standard deviation for the actor

07:22.680 --> 07:24.330
and a large one for the critic.

07:24.330 --> 07:27.750
And we will do this very easily thanks to this function.

07:27.750 --> 07:28.583
All right.

07:28.583 --> 07:31.170
And so now we have only one thing to do left.

07:31.170 --> 07:34.200
It's of course to return the output

07:34.200 --> 07:37.080
that is now normalized with this specific

07:37.080 --> 07:38.618
standard deviation.

07:38.618 --> 07:40.350
All right, so perfect.

07:40.350 --> 07:42.840
That's the first function we had to make.

07:42.840 --> 07:44.160
That's the first tool

07:44.160 --> 07:47.340
we will be very happy to use to make the A3C Brain.

07:47.340 --> 07:49.350
We have one more function to make now.

07:49.350 --> 07:51.330
It's gonna be the weight and its function

07:51.330 --> 07:54.150
and that's just a function that will, I remind,

07:54.150 --> 07:57.540
initialize the weights to make the learning optimal.

07:57.540 --> 07:59.520
So let's do this in the next tutorial.

07:59.520 --> 08:01.383
And until then, enjoy AI.
