WEBVTT

00:00.630 --> 00:01.950
Speaker: Hello and welcome back to the course

00:01.950 --> 00:03.990
on artificial intelligence.

00:03.990 --> 00:06.780
In today's tutorial we're starting off this section

00:06.780 --> 00:09.390
on deep convolutional key learning.

00:09.390 --> 00:11.010
So let's have a look at what it's all about.

00:11.010 --> 00:14.040
Previously, we talked about deep key learning.

00:14.040 --> 00:17.070
So we had an environment with an agent

00:17.070 --> 00:20.640
and we had a vector describing that environment

00:20.640 --> 00:23.100
which was fed into a neural network.

00:23.100 --> 00:26.280
And at the end we got the Q-values as our outputs.

00:26.280 --> 00:27.690
And then of course we, we found

00:27.690 --> 00:29.310
out how to tr- how the network is trained,

00:29.310 --> 00:30.143
the learning part.

00:30.143 --> 00:31.830
We found out how actions are decided

00:31.830 --> 00:33.000
based on those Q-values,

00:33.000 --> 00:34.650
that's a action part.

00:34.650 --> 00:37.950
And we talked about the action selection policies and

00:37.950 --> 00:40.235
and different things about, um,

00:40.235 --> 00:42.420
how deep key learning works.

00:42.420 --> 00:45.719
But here, the key concept for all of this is

00:45.719 --> 00:50.719
how do we get from this, from the actual environment

00:51.060 --> 00:54.480
and the states to the neural network?

00:54.480 --> 00:57.870
Well, the transition is over here, the input vector.

00:57.870 --> 01:00.720
So the input layer of our neural network,

01:00.720 --> 01:02.250
and it is a vector.

01:02.250 --> 01:04.620
So what we are looking at is, okay,

01:04.620 --> 01:06.660
so we're actually, actually not the correct

01:06.660 --> 01:07.493
it's not the correct term.

01:07.493 --> 01:08.910
We're not looking at anything.

01:08.910 --> 01:12.330
The agent basically has this information.

01:12.330 --> 01:14.040
So the environment is passing it

01:14.040 --> 01:16.590
this information saying, okay, you, the agent,

01:16.590 --> 01:18.330
you're currently in this, this

01:18.330 --> 01:20.490
your state is described by this vector.

01:20.490 --> 01:23.185
In this simplified example, it's described

01:23.185 --> 01:26.730
by this vector 'X' one of one, 'x' two of two.

01:26.730 --> 01:28.604
So your coordinate's a one, two

01:28.604 --> 01:30.090
and that is your whole state.

01:30.090 --> 01:31.770
In a, in a more complex environment, the state

01:31.770 --> 01:36.090
might involve other things that the agent can be observing.

01:36.090 --> 01:39.240
But the point here is that it is passed as a vector.

01:39.240 --> 01:42.390
And the thing is that that doesn't happen in real life,

01:42.390 --> 01:45.390
in real life, except for GPS systems

01:45.390 --> 01:46.530
and other things like that.

01:46.530 --> 01:48.990
But in real life, what do we use most of the time?

01:48.990 --> 01:50.880
We use our senses. We use our eyes.

01:50.880 --> 01:53.670
Even in GPS, it's not built into our brain.

01:53.670 --> 01:56.250
It's not telling us the coordinates through our brain.

01:56.250 --> 01:59.556
And so it, we are still using our eyes to look at,

01:59.556 --> 02:02.910
at the GPS and understand what's going on there.

02:02.910 --> 02:07.620
And so this is kind of cheating for AI to be able to get,

02:07.620 --> 02:09.630
like, information about the environment as a vector.

02:09.630 --> 02:10.620
It's too simple.

02:10.620 --> 02:12.030
It's not how it works in real life.

02:12.030 --> 02:13.856
That's not how we as humans operate

02:13.856 --> 02:16.221
and ultimately wanna create artificial intelligence

02:16.221 --> 02:19.530
which can operate sim- in a similar fashion to humans

02:19.530 --> 02:22.110
which is as, like can, can take on

02:22.110 --> 02:23.310
the same challenges as humans.

02:23.310 --> 02:25.800
And so in, in the human world, we don't have that.

02:25.800 --> 02:28.230
We don't have that, we don't have these coordinates

02:28.230 --> 02:30.270
or other types of vectors that are passed

02:30.270 --> 02:33.870
to us that explain the state we're in, in that environment.

02:33.870 --> 02:35.790
So we're gonna have to remove that

02:35.790 --> 02:37.410
to make it more realistic.

02:37.410 --> 02:38.910
And then what can we replace it with?

02:38.910 --> 02:41.520
What do we see, or what do we do as a human

02:41.520 --> 02:42.353
to get information?

02:42.353 --> 02:43.680
Well, most of the time we see,

02:43.680 --> 02:44.850
of course we use all of our senses,

02:44.850 --> 02:46.860
but most of the information that we are getting

02:46.860 --> 02:48.900
about the out the world around us

02:48.900 --> 02:51.480
comes through our sight.

02:51.480 --> 02:55.170
And that is why we are going to change that little arrow

02:55.170 --> 03:00.060
which we had into a whole convolutional neural network.

03:00.060 --> 03:03.990
So this is from our annex number two.

03:03.990 --> 03:05.370
We've got the convolutional layer

03:05.370 --> 03:08.220
and that's why it's important to be quite comfortable

03:08.220 --> 03:10.590
with convolution, convolutional neural networks

03:10.590 --> 03:11.423
and how they work.

03:11.423 --> 03:13.080
So you've, if you've done our deep learning course

03:13.080 --> 03:14.940
then you should be comfortable with that,

03:14.940 --> 03:17.730
or you can just have a look at the annex number two.

03:17.730 --> 03:20.640
We've got some very good intuition tutorials there.

03:20.640 --> 03:24.360
So here we've got the convolution operation, which happens.

03:24.360 --> 03:25.980
So we we're actually going to be looking

03:25.980 --> 03:27.330
at this as an image.

03:27.330 --> 03:31.350
So this is an image of, um, net environment.

03:31.350 --> 03:34.020
And so the agent is actually looking at the environment.

03:34.020 --> 03:37.380
So in this case, uh, not that he's like looking

03:37.380 --> 03:39.960
from within there, he's like looking like

03:39.960 --> 03:42.270
let's say he's playing this on a computer

03:42.270 --> 03:43.470
and he can see this environment

03:43.470 --> 03:46.200
and therefore he can see like where this figure

03:46.200 --> 03:48.420
representing the agent is actually, actually is.

03:48.420 --> 03:49.673
So he can see his whole environment

03:49.673 --> 03:51.780
or whatever a human would see.

03:51.780 --> 03:52.800
If it's an actual maze

03:52.800 --> 03:54.270
and the human would see the maze from inside.

03:54.270 --> 03:55.830
And so the agent should be able to see exactly

03:55.830 --> 03:56.700
the same thing.

03:56.700 --> 03:59.040
So whatever he sees is done through,

03:59.040 --> 04:00.570
goes through a convolutional layer,

04:00.570 --> 04:02.130
it goes through a full pulling layer,

04:02.130 --> 04:03.360
it goes through flattening again,

04:03.360 --> 04:05.100
you can find about out about more

04:05.100 --> 04:06.810
about these different parts

04:06.810 --> 04:09.425
of a convolutional neural network

04:09.425 --> 04:10.860
in the annex.

04:10.860 --> 04:12.690
And then after it's flattened,

04:12.690 --> 04:16.800
then we have inputs which go into the neural network.

04:16.800 --> 04:18.570
And this is way more realistic

04:18.570 --> 04:22.410
because the agent has to use their sight and

04:22.410 --> 04:26.460
or has to process images which the environment

04:26.460 --> 04:29.400
is supplying to the agent

04:29.400 --> 04:31.560
just as a human would be processing images.

04:31.560 --> 04:34.110
And the beauty of this is not just

04:34.110 --> 04:36.870
that it's more realistic and it's kind of like more

04:36.870 --> 04:40.080
as a hu, the agent's actually more as a human would be.

04:40.080 --> 04:43.380
But it allows us to process much more complex environments.

04:43.380 --> 04:47.340
For instance, this is how we can play Doom or other games

04:47.340 --> 04:49.620
like that because instead of just getting a vector

04:49.620 --> 04:54.210
of information which like somebody would've created

04:54.210 --> 04:56.370
for us in this environment, we can just hook

04:56.370 --> 04:59.370
up our artificial intelligence to any environment which

04:59.370 --> 05:01.900
as humans we would have vision of this environment.

05:01.900 --> 05:04.620
So we as a human, when you're playing this game

05:04.620 --> 05:06.480
you can see exactly this picture.

05:06.480 --> 05:10.170
And that's exactly what the artificial neural network

05:10.170 --> 05:11.876
or the agent would see now.

05:11.876 --> 05:14.670
So in, in this part of the course when you're gonna

05:14.670 --> 05:16.470
be programing the practical tutorials,

05:16.470 --> 05:18.720
the agent will actually see this exact picture,

05:18.720 --> 05:19.800
it'll see the pixels.

05:19.800 --> 05:22.121
It will get this exact picture for all of the pixels

05:22.121 --> 05:24.990
with this person, with this, with this gun,

05:24.990 --> 05:26.940
with this face, with this percentage,

05:26.940 --> 05:28.650
with everything exactly what we see here,

05:28.650 --> 05:30.840
that's exactly what the agent will see.

05:30.840 --> 05:33.214
Then it'll have to dissect that through

05:33.214 --> 05:35.910
convolutional layer, pulling layer, flattening,

05:35.910 --> 05:37.620
and then it'll go into a neural network.

05:37.620 --> 05:39.480
And needless to say that the neural network

05:39.480 --> 05:41.040
will actually be much more complex than that.

05:41.040 --> 05:42.780
So let's replace it with something like this.

05:42.780 --> 05:44.490
This is not much more complex.

05:44.490 --> 05:46.500
This is, uh, looks a little bit more complex,

05:46.500 --> 05:48.240
but in reality the neural networks

05:48.240 --> 05:49.380
you're going to be working with

05:49.380 --> 05:52.110
and creating with Alan are going to be quite

05:52.110 --> 05:54.150
interesting and gonna be much more complex than this.

05:54.150 --> 05:56.310
But as you can see already here, even if we just have

05:56.310 --> 05:58.230
five inputs instead of two,

05:58.230 --> 06:00.840
things become much more complex.

06:00.840 --> 06:03.300
And here you can see we have many more actions

06:03.300 --> 06:04.380
that the agent can take.

06:04.380 --> 06:06.323
So in the game of Doom, turn left

06:06.323 --> 06:10.890
turn right, look down, look up, run, shoot, reload.

06:10.890 --> 06:14.160
Or you know, all those different actions that are possible

06:14.160 --> 06:16.290
in first-person shooter like, like Doom.

06:16.290 --> 06:19.440
And moreover, it doesn't have to be that you can

06:19.440 --> 06:22.742
you can attach this agent to another type of game.

06:22.742 --> 06:26.670
That's the beauty of it, that it then realizes

06:26.670 --> 06:29.130
that it can now operate any kind

06:29.130 --> 06:30.540
of environment that you attach it to.

06:30.540 --> 06:32.910
Because as long as there's like a visual representation

06:32.910 --> 06:34.620
of environment, of that environment

06:34.620 --> 06:37.020
it's already got the whole infrastructure

06:37.020 --> 06:39.960
the whole structure is ready to process that.

06:39.960 --> 06:44.010
So that's what deep convolutional key learning is all about.

06:44.010 --> 06:46.230
So we're taking it even to the next step.

06:46.230 --> 06:50.790
We're adding convolutions into, or the convolutional layers

06:50.790 --> 06:53.149
into our agent's brain now

06:53.149 --> 06:55.680
and we're making it even more complex

06:55.680 --> 06:57.720
and therefore we can, we're rewarded

06:57.720 --> 07:01.380
with being able to solve even more complex challenges.

07:01.380 --> 07:02.973
So I hope you're very excited about this.

07:02.973 --> 07:05.610
This is gonna be an, an epic section

07:05.610 --> 07:07.980
and we're going to create some amazing things

07:07.980 --> 07:10.440
and I can't wait to see you on the next tutorial.

07:10.440 --> 07:12.303
And until then, enjoy AI.