WEBVTT

00:00.630 --> 00:01.740
-: Hello and welcome back to the course

00:01.740 --> 00:03.510
on artificial intelligence.

00:03.510 --> 00:06.150
In today's tutorial, we're taking our first step

00:06.150 --> 00:07.980
into the world of A3C.

00:07.980 --> 00:09.510
And as a first step, we're going to find out

00:09.510 --> 00:11.400
what this abbreviation stands for.

00:11.400 --> 00:12.570
So A3C stands

00:12.570 --> 00:16.230
for Asynchronous Advantage Actor-Critic algorithm.

00:16.230 --> 00:17.880
Now, this is an algorithm which was developed

00:17.880 --> 00:22.880
at Google DeepMind in 2016 by a group of researchers

00:22.980 --> 00:25.890
and it's the cutting edge algorithm

00:25.890 --> 00:28.650
for artificial intelligence to date.

00:28.650 --> 00:30.090
Now it has multiple modifications

00:30.090 --> 00:32.310
and we'll discuss that more in the course

00:32.310 --> 00:35.190
especially in the practical tutorials.

00:35.190 --> 00:38.220
But nevertheless, this algorithm blows everything else

00:38.220 --> 00:41.881
including Deep Convolution Q-Learning networks

00:41.881 --> 00:44.250
out of the water, completely out of the water.

00:44.250 --> 00:48.720
And it is faster, it takes less time for training,

00:48.720 --> 00:50.310
and gets better results.

00:50.310 --> 00:52.710
So throughout this part of the course,

00:52.710 --> 00:55.170
we'll be referencing, and we have already referenced,

00:55.170 --> 00:56.910
but we'll be referencing even more,

00:56.910 --> 01:00.000
a paper or the paper that was published

01:00.000 --> 01:01.860
that first introduced A3C.

01:01.860 --> 01:03.120
It's called "Asynchronous Methods

01:03.120 --> 01:06.480
of Deep Reinforcement Learning" by Volodymyr Mnih

01:06.480 --> 01:09.390
and others from Google DeepMind.

01:09.390 --> 01:11.910
So I'm going to show you this paper now

01:11.910 --> 01:14.823
so that you have a introduction to it.

01:16.094 --> 01:17.850
So here is this paper.

01:17.850 --> 01:22.850
I wanted to show it to you so that you can get a feel for it

01:23.160 --> 01:25.350
and already get introduced to it a little bit.

01:25.350 --> 01:27.930
And of course, it is highly recommended

01:27.930 --> 01:31.590
to read through the paper and understand

01:31.590 --> 01:34.170
what exactly they're talking about.

01:34.170 --> 01:38.790
And you'll see that throughout the practical tutorials

01:38.790 --> 01:43.020
Alan will be taking you through certain parts of the paper,

01:43.020 --> 01:46.830
through certain paragraphs or sections

01:46.830 --> 01:50.280
which will be relevant to what we'll be programming

01:50.280 --> 01:51.930
at that point in time.

01:51.930 --> 01:55.170
And what I wanted to point out here is like, as you can see,

01:55.170 --> 01:57.270
a lot of research went into this,

01:57.270 --> 01:59.310
but and there's a lot of references as well,

01:59.310 --> 02:03.450
but like a part of that I really like is that at the end,

02:03.450 --> 02:07.290
the very end, they compare the different algorithms,

02:07.290 --> 02:08.123
compare the results.

02:08.123 --> 02:10.140
And this is what I wanted to point out here.

02:10.140 --> 02:11.610
So let's zoom in a little bit.

02:11.610 --> 02:14.940
So here, as you can see, even in Google DeepMind,

02:14.940 --> 02:18.120
they are training or they're evaluating the algorithms

02:18.120 --> 02:20.520
on games just as we are doing in this course.

02:20.520 --> 02:23.820
So exactly the same principle because games

02:23.820 --> 02:27.300
are a simulated environment, or a small environment,

02:27.300 --> 02:28.860
a confined environment with certain rules,

02:28.860 --> 02:30.840
and they want to understand

02:30.840 --> 02:32.820
how well this artificial intelligence is doing

02:32.820 --> 02:33.653
in those games.

02:33.653 --> 02:36.150
And here we've got exactly all those games

02:36.150 --> 02:37.113
which you can find.

02:38.400 --> 02:42.120
A lot of them, you can find on OpenAI Gym.

02:42.120 --> 02:45.000
And the games that we've been working with, so for instance,

02:45.000 --> 02:46.860
in this section we are working with Breakout,

02:46.860 --> 02:48.573
so it's also here.

02:48.573 --> 02:50.580
And so you can see that for Breakout,

02:50.580 --> 02:51.780
they've got it in bold,

02:51.780 --> 02:53.910
they've got the best algorithm highlighted.

02:53.910 --> 02:56.276
So DQN, that's the algorithm we've been working with.

02:56.276 --> 03:00.390
Then some other algorithms, and then here you've got A3C.

03:00.390 --> 03:03.570
A3C with LSTM, long short-term memory.

03:03.570 --> 03:05.310
So that's the one we'll be implementing

03:05.310 --> 03:06.600
in this part of the course.

03:06.600 --> 03:10.200
We'll have A3C with LSTM, which makes it even stronger.

03:10.200 --> 03:14.430
So as you can see, Breakout, the best result is achieved

03:14.430 --> 03:15.630
by A3C with LSTM.

03:15.630 --> 03:19.860
So that's the score, 766.8, compared to the others.

03:19.860 --> 03:24.060
And also you can see that for the most of them.

03:24.060 --> 03:28.080
So if we now take like a bigger picture view,

03:28.080 --> 03:30.210
you can see that most of the bold ones

03:30.210 --> 03:31.980
are actually in this last column.

03:31.980 --> 03:35.130
So yes, indeed, there are some games where other algorithms

03:35.130 --> 03:36.360
are performing better.

03:36.360 --> 03:40.410
But as you can see, DQN is actually not performing better

03:40.410 --> 03:42.690
in any of the games.

03:42.690 --> 03:45.300
But you can see that there are other algorithms.

03:45.300 --> 03:47.460
Other algorithms perform better sometimes,

03:47.460 --> 03:51.840
but A3C-LSTM performs the best in most cases.

03:51.840 --> 03:54.960
So you can see that this is bold, this is bold, this one,

03:54.960 --> 03:57.600
these ones, this one, and so on.

03:57.600 --> 04:01.890
So you can see that A3C-LSTM is a really powerful algorithm.

04:01.890 --> 04:06.660
It is indeed at the forefront of artificial intelligence

04:06.660 --> 04:08.580
and that's exactly what we'll be implementing.

04:08.580 --> 04:10.140
So very exciting section ahead.

04:10.140 --> 04:13.800
Highly encourage you to go through this paper

04:13.800 --> 04:18.450
and get a feel for what we're going to be talking about.

04:18.450 --> 04:20.340
And then throughout this section

04:20.340 --> 04:24.360
and throughout especially the practical side of the things,

04:24.360 --> 04:25.290
practical side of STRAWs,

04:25.290 --> 04:27.240
we're going to be going through this in detail.

04:27.240 --> 04:28.320
We're actually going to be working

04:28.320 --> 04:32.580
with their pseudocode here, which is available,

04:32.580 --> 04:33.540
and we're going to be,

04:33.540 --> 04:35.550
this outline will show you how to implement that

04:35.550 --> 04:37.350
and how we're going to be working with that.

04:37.350 --> 04:40.740
And on that note, I hope you're going to enjoy this paper

04:40.740 --> 04:42.660
and I look forward to seeing you next time.

04:42.660 --> 04:44.433
And until then, enjoy AI.