WEBVTT

00:00.660 --> 00:02.970
-: Hello and welcome back to the course on deep learning.

00:02.970 --> 00:04.350
Today we're talking about RELU,

00:04.350 --> 00:06.210
which is rectified linear units.

00:06.210 --> 00:08.790
And this is an additional step

00:08.790 --> 00:12.180
on top of our convolution step.

00:12.180 --> 00:14.580
So it's not a separate big step, it's a small step.

00:14.580 --> 00:16.230
It's step one B basically.

00:16.230 --> 00:18.240
And what is going on here?

00:18.240 --> 00:20.400
Well, we have our input image

00:20.400 --> 00:22.950
we have our convolution layer, which we've discussed,

00:22.950 --> 00:26.370
and then on top of that, we're going to apply,

00:26.370 --> 00:31.110
wait for it, our favorite, rectifier function.

00:31.110 --> 00:33.420
And you're really familiar with the rectifier function

00:33.420 --> 00:38.420
from the previous section on artificial neural networks.

00:38.790 --> 00:42.240
And in our, so sometimes

00:42.240 --> 00:46.590
authors or instructors separate

00:46.590 --> 00:49.050
the convolution and the rectifier as two separate steps.

00:49.050 --> 00:50.160
In our examples

00:50.160 --> 00:55.160
we are just going to consider them just one big step.

00:55.290 --> 00:57.210
First the convolution, then the rectifier.

00:57.210 --> 01:00.300
And the reason why we're applying the rectifier

01:00.300 --> 01:04.350
is because we want to increase non-linearity in our image

01:04.350 --> 01:09.013
or in our network, in our commercial neural network.

01:09.013 --> 01:12.630
And a rectifier acts as that filter,

01:12.630 --> 01:15.810
or acts as that function, which breaks up linearity.

01:15.810 --> 01:18.750
And the reason why we want to increase non-linearity

01:18.750 --> 01:20.430
in our network is

01:20.430 --> 01:24.630
because images themselves are highly non-linear,

01:24.630 --> 01:27.130
especially if you're recognizing different objects

01:28.050 --> 01:30.750
next to each other or just on backgrounds

01:30.750 --> 01:31.583
and stuff like that.

01:31.583 --> 01:34.590
Like the image is going to have lots of non-linear elements.

01:34.590 --> 01:36.060
And the transition between pixels,

01:36.060 --> 01:38.040
adjacent pixels is often gonna be non-linear.

01:38.040 --> 01:39.660
That's, you know, because there's borders,

01:39.660 --> 01:41.574
there's different colors,

01:41.574 --> 01:43.740
there's different elements in your images.

01:43.740 --> 01:44.970
And, but at the same time

01:44.970 --> 01:46.470
when we're applying mathematical operations

01:46.470 --> 01:50.580
such as convolution and running this feature

01:50.580 --> 01:52.680
detection to create our feature maps

01:52.680 --> 01:57.270
we risk that we might create something linear

01:57.270 --> 02:00.000
and therefore we need to break up the linearity.

02:00.000 --> 02:02.580
So let's have a look at an example.

02:02.580 --> 02:05.970
Here is a image, an original image.

02:05.970 --> 02:10.970
Now when we apply a feature detector to this image

02:11.340 --> 02:13.260
we get something like this.

02:13.260 --> 02:14.093
So you can see here

02:14.093 --> 02:15.990
that black is negative, white is positive values.

02:15.990 --> 02:20.160
Well, when you apply a feature detector to a

02:20.160 --> 02:22.740
proper image, which has not just zeros and ones

02:22.740 --> 02:25.260
but has lots of different values, and you apply

02:25.260 --> 02:26.400
as we saw previously

02:26.400 --> 02:29.040
future detects can have negative values in themselves.

02:29.040 --> 02:32.490
Sometimes you'll get negative values, and here or there

02:32.490 --> 02:34.770
black ones are negative, white ones are positive.

02:34.770 --> 02:39.770
And what a rectified linear unit function does

02:41.460 --> 02:44.670
is it removes all the black, right?

02:44.670 --> 02:46.530
Anything below zero turns into zero.

02:46.530 --> 02:49.260
And so from this, it turns into this, right?

02:49.260 --> 02:50.093
And so it's

02:50.093 --> 02:55.020
it's pretty hard to see what exactly is the benefit

02:55.020 --> 02:59.370
in terms of breaking up linearity.

02:59.370 --> 03:03.060
I'll try to explain, I'll try to show an example

03:03.060 --> 03:06.420
on this image, but at the end of the day,

03:06.420 --> 03:09.120
this is a very mathematical concept and we'd have to go

03:09.120 --> 03:12.480
into a lot of math to really explain what is going on.

03:12.480 --> 03:13.830
But let's try, let's have a look.

03:13.830 --> 03:15.780
So for instance

03:15.780 --> 03:18.090
let's look at this building here, right?

03:18.090 --> 03:19.803
So this is a building on its own.

03:20.730 --> 03:23.490
Then you can see this shadow, this black part

03:23.490 --> 03:24.600
this shadow over here.

03:24.600 --> 03:27.270
Well, you can see that it's white,

03:27.270 --> 03:28.830
the reflection of the light,

03:28.830 --> 03:31.290
and then it's a gray, and then it gets darker

03:31.290 --> 03:33.300
and then it gets darker again, right?

03:33.300 --> 03:35.880
So, and when we take it out, we take out that black part.

03:35.880 --> 03:38.250
So think of it in terms of linearity, right?

03:38.250 --> 03:42.120
So it looks like when you go from white to gray

03:42.120 --> 03:43.950
the next step would be black, right?

03:43.950 --> 03:44.970
The next step would be black.

03:44.970 --> 03:49.680
It's a linear progression from bright to dark.

03:49.680 --> 03:53.490
And therefore this is kind of like a linear situation.

03:53.490 --> 03:56.670
When you take out the black, you break up the linearity.

03:56.670 --> 03:58.020
Let's try another one.

03:58.020 --> 03:59.160
Let's have a look here.

03:59.160 --> 04:00.510
And, and at the same time

04:00.510 --> 04:02.002
it's still that same building, right?

04:02.002 --> 04:06.780
It's not like you are

04:06.780 --> 04:07.740
blending two buildings

04:07.740 --> 04:09.810
into each other, but that is secondary.

04:09.810 --> 04:12.180
The main point is breaking up the linearity.

04:12.180 --> 04:13.110
So let's have a look here.

04:13.110 --> 04:14.472
Same thing.

04:14.472 --> 04:19.472
So you see white, gray, black, gray, white.

04:19.530 --> 04:21.030
And when you break it up

04:21.030 --> 04:22.530
you don't have that anymore, right?

04:22.530 --> 04:24.420
You don't have that progression,

04:24.420 --> 04:26.370
a gradual progression that you just have

04:26.370 --> 04:29.700
like an abrupt change.

04:29.700 --> 04:33.510
And that helps introduce non-linearity into your image.

04:33.510 --> 04:37.651
So it is a very rough explanation, very kind of like

04:37.651 --> 04:42.651
on the fingers explanation rather than technical.

04:42.690 --> 04:45.360
But hopefully it kind of helps you understand

04:45.360 --> 04:47.370
a bit better what we are talking about here.

04:47.370 --> 04:50.520
So here again, you can see white, gray is a better example.

04:50.520 --> 04:54.450
Even see bright, darker, darker, darker,

04:54.450 --> 04:55.650
darker, darker darker.

04:55.650 --> 04:58.260
So this part looks like it's linear.

04:58.260 --> 04:59.810
Then you break it up like that.

05:00.750 --> 05:04.470
Again, so this is a very rough explanation.

05:04.470 --> 05:06.600
It's not absolutely perfect, but at least it gives you

05:06.600 --> 05:08.820
some idea of what's going on.

05:08.820 --> 05:10.530
But if you'd like to learn more,

05:10.530 --> 05:12.930
there's a good paper as always.

05:12.930 --> 05:14.190
There's always a paper.

05:14.190 --> 05:17.970
This one is by CCJ Core from the University of California

05:17.970 --> 05:20.700
and it's called Understanding Convolutional Neural

05:20.700 --> 05:23.160
Networks Who Have a Mathematical Model.

05:23.160 --> 05:26.430
And basically there he answers two questions

05:26.430 --> 05:28.830
and you need to just look at the first one.

05:28.830 --> 05:31.920
And the question is, why a non-linear activation function

05:31.920 --> 05:33.630
is essential at the filter output

05:33.630 --> 05:36.180
of all intermediate layers.

05:36.180 --> 05:39.990
So that kind of explains it in a bit more detail,

05:39.990 --> 05:41.910
both in terms of intuition

05:41.910 --> 05:44.310
and mostly in terms of mathematics.

05:44.310 --> 05:46.130
So that's an interesting paper where you can get some more

05:46.130 --> 05:48.090
additional information on this topic.

05:48.090 --> 05:49.830
And if you really want to dig in

05:49.830 --> 05:53.340
and explore some cool stuff here

05:53.340 --> 05:54.480
then there's another paper

05:54.480 --> 05:55.680
that you might be interested in.

05:55.680 --> 05:58.530
It's called Delving Deep into Rectifier,

05:58.530 --> 06:00.900
Surpassing Human Level Performance

06:00.900 --> 06:02.910
on Image and Net Classification.

06:02.910 --> 06:07.740
And here the authors Kiming, Head, and others

06:07.740 --> 06:12.740
from Microsoft Research, they propose a different type

06:13.740 --> 06:17.300
of rectified linear unit function.

06:17.300 --> 06:20.580
They propose the parametric rectified linear unit function

06:20.580 --> 06:22.020
which you see here on the right.

06:22.020 --> 06:24.270
And they argue that it delivers better results

06:24.270 --> 06:26.730
without sacrificing performance.

06:26.730 --> 06:27.690
So interesting read

06:27.690 --> 06:30.450
if you'd like to get a bit more into this topic.

06:30.450 --> 06:31.950
And that's all for today.

06:31.950 --> 06:34.530
The RELU layer is pretty simple,

06:34.530 --> 06:37.830
pretty straightforward, just applying the rectify function.

06:37.830 --> 06:39.210
And I look forward to seeing you next time.

06:39.210 --> 06:40.953
Until then, enjoy deep learning.