1
00:00:11,690 --> 00:00:17,060
In this lecture we are going to take a closer look at how the Gan code will work and hopefully iron

2
00:00:17,060 --> 00:00:20,780
out a few details before we move on to the actual notebook.

3
00:00:20,780 --> 00:00:25,490
This is another interesting model in this course because unlike all the other algorithms we looked at

4
00:00:26,060 --> 00:00:30,540
now we're dealing with two neural networks simultaneously rather than just one.

5
00:00:30,560 --> 00:00:32,870
So it's kind of an odd concept.

6
00:00:33,020 --> 00:00:37,450
What I want to focus on in this lecture are a few overarching themes.

7
00:00:37,490 --> 00:00:40,260
First we'll look at the architecture of each model.

8
00:00:40,610 --> 00:00:45,080
Just for fun we'll use a few more advanced layers and activation functions that we didn't use earlier

9
00:00:45,080 --> 00:00:46,640
in the course.

10
00:00:46,640 --> 00:00:49,160
Second we'll look at how to train them.

11
00:00:49,220 --> 00:00:54,440
This is the centrepiece of this section because unlike all the previous examples you saw in this chorus

12
00:00:54,860 --> 00:00:57,680
this training loop will be very different from what you saw before

13
00:01:02,800 --> 00:01:08,440
you may have assumed that because the generator is a kind of a. discriminator that they would have opposite

14
00:01:08,440 --> 00:01:09,990
architectures.

15
00:01:10,210 --> 00:01:15,760
What I mean by that is if the generator had hit and sizes of one hundred three hundred and seventy four

16
00:01:16,210 --> 00:01:20,940
that the discriminator would have hit in sizes of seventy four three hundred and one hundred.

17
00:01:21,190 --> 00:01:27,580
In fact this need not be the case in our example we'll make the generator a more powerful model than

18
00:01:27,580 --> 00:01:28,820
the discriminator.

19
00:01:29,080 --> 00:01:36,190
The intuition is binary classification is an easy task whereas generating images is a hard task for

20
00:01:36,190 --> 00:01:37,420
the discriminator.

21
00:01:37,420 --> 00:01:40,570
All you have to do is map the input to 0 or 1.

22
00:01:40,630 --> 00:01:43,990
There aren't that many degrees of freedom for the generator.

23
00:01:43,990 --> 00:01:49,140
You have to generate seven hundred eighty four different pixel values and they all have to be coherent.

24
00:01:49,240 --> 00:01:51,280
So there are many degrees of freedom.

25
00:01:51,490 --> 00:02:00,420
So that's the idea generating is hard but discriminating is easy.

26
00:02:00,450 --> 00:02:04,560
So here's what we'll be using for the discriminator although you should feel free to play around and

27
00:02:04,560 --> 00:02:06,070
design your own.

28
00:02:06,090 --> 00:02:10,180
It's just a series of linear layers followed by leaky values.

29
00:02:10,380 --> 00:02:16,170
Recall that a leaky Raju is just like a normal value except it has a small positive slope for values

30
00:02:16,170 --> 00:02:17,690
less than zero.

31
00:02:17,700 --> 00:02:22,980
This is an attempt to avoid the problem of dead neurons where the slope becomes zero and therefore the

32
00:02:22,980 --> 00:02:26,400
gradient becomes zero and therefore no Eric and back propagate

33
00:02:31,470 --> 00:02:33,310
here's what the generator looks like.

34
00:02:33,330 --> 00:02:34,810
Again feel free to play around.

35
00:02:35,520 --> 00:02:41,220
So in addition to leaky well using linear layers we'll also add back Norm to the mix.

36
00:02:41,220 --> 00:02:46,530
We've also added a 10 H activation at the end which implies that we'll be working with images in the

37
00:02:46,530 --> 00:02:51,480
range minus one plus one rather than zero to one experimentally.

38
00:02:51,480 --> 00:03:00,270
Researchers have found that these changes tend to yield better results.

39
00:03:00,290 --> 00:03:03,100
OK so here's the real meat of this code.

40
00:03:03,100 --> 00:03:08,890
After we define the discriminator deep and the generator G we're going to define the loss and optimizer

41
00:03:09,720 --> 00:03:12,250
the loss is just the usual binary cross entropy.

42
00:03:12,490 --> 00:03:16,680
But we're going to have to separate optimizes one for each network.

43
00:03:16,690 --> 00:03:21,730
You'll notice that when we instantiate the optimizes we only pass in the parameters that we want the

44
00:03:21,730 --> 00:03:23,600
optimizer to update.

45
00:03:23,620 --> 00:03:29,440
So when we call the optimizer dot step it's only going to update the parameters of the discriminator

46
00:03:30,100 --> 00:03:32,060
when we call G optimize it out step.

47
00:03:32,320 --> 00:03:34,960
It's only going to update the parameters in the generator

48
00:03:40,030 --> 00:03:41,240
inside the training block.

49
00:03:41,260 --> 00:03:46,250
We're going to start by instantiating constant arrays to store zeros and ones.

50
00:03:46,360 --> 00:03:48,100
These will act as our targets.

51
00:03:48,100 --> 00:03:53,740
So we don't have to keep creating new arrays on each iteration of the training loop inside the training

52
00:03:53,740 --> 00:03:54,040
loop.

53
00:03:54,040 --> 00:03:56,500
We're going to loop over the data loader.

54
00:03:56,500 --> 00:04:02,860
Remember that we don't need the image labels so we can use an underscore to ignore them what we're going

55
00:04:02,860 --> 00:04:08,620
to do now is two training blocks one for the discriminator and one for the generator.

56
00:04:08,620 --> 00:04:11,920
This slide will focus on the discriminator.

57
00:04:11,920 --> 00:04:17,290
Now one question you probably have is does it matter which order you train them in the discriminator

58
00:04:17,290 --> 00:04:19,160
first or the generator first.

59
00:04:19,450 --> 00:04:21,520
I would recommend you try it out as an experiment

60
00:04:24,380 --> 00:04:27,020
so how do we train the discriminator.

61
00:04:27,020 --> 00:04:31,010
First we start by passing real images into the discriminator.

62
00:04:31,010 --> 00:04:33,830
That's just the outputs from the data loader.

63
00:04:33,830 --> 00:04:36,860
From this we can calculate the loss using the ones target

64
00:04:40,730 --> 00:04:41,800
for the fake images.

65
00:04:41,810 --> 00:04:46,930
We start by generating a batch of random noise and then passing that through the generator.

66
00:04:46,940 --> 00:04:51,310
From that we get fake images which we then pass through the discriminator.

67
00:04:51,380 --> 00:04:53,810
And from that we get fake outputs.

68
00:04:53,930 --> 00:04:57,650
We pass the fake outputs through the loss function using the zeros target

69
00:05:00,800 --> 00:05:01,150
next.

70
00:05:01,160 --> 00:05:04,180
We take the average of the two losses to get a single value.

71
00:05:04,190 --> 00:05:05,990
We can optimize.

72
00:05:05,990 --> 00:05:10,510
After this we would like to call d lost out backward and d optimize that step.

73
00:05:10,760 --> 00:05:13,580
But before that we have to remember to zero the gradients

74
00:05:18,700 --> 00:05:19,300
next.

75
00:05:19,300 --> 00:05:21,960
Let's look at the generator training block.

76
00:05:22,030 --> 00:05:27,340
One interesting technique discovered by researchers was that doing multiple steps of training for the

77
00:05:27,340 --> 00:05:32,230
generator for each single step of the discriminator seemed to work well.

78
00:05:32,260 --> 00:05:37,210
This goes along with a theme that the generator is harder to optimize than the discriminator.

79
00:05:37,210 --> 00:05:43,630
Therefore it requires more training so to accomplish that we can just do a loop inside the loop.

80
00:05:43,630 --> 00:05:50,530
We generate some fake images once again using a batch of latent noise and passing this through the generator.

81
00:05:50,530 --> 00:05:54,940
Since this is for training the generator there is no need to use any real images.

82
00:05:54,940 --> 00:06:00,610
We just passed the fake images through the discriminator and then passed that output through the loss.

83
00:06:00,610 --> 00:06:05,890
Importantly this time we use the ones targeted to tell the lost function that we want the weights to

84
00:06:05,890 --> 00:06:09,810
go towards these images being classified as real.

85
00:06:09,820 --> 00:06:13,330
Next we zero the gradients and do a gradient descent step as usual.