WEBVTT

00:00.860 --> 00:03.210
-: Okay, this is Stable Diffusion.

00:03.210 --> 00:05.520
Now this is definitely the most technical

00:05.520 --> 00:08.850
of the image models because it is open source.

00:08.850 --> 00:13.590
So you can actually run Stable Diffusion on Stability AI.

00:13.590 --> 00:15.660
They have their own interface there,

00:15.660 --> 00:18.450
but most of the reason why you want to use Stable Diffusion

00:18.450 --> 00:19.830
so you can customize it.

00:19.830 --> 00:22.770
So it really applies more to developers

00:22.770 --> 00:25.350
than to like the average person.

00:25.350 --> 00:28.620
I would say if you don't know how to code,

00:28.620 --> 00:30.150
you don't want to code,

00:30.150 --> 00:31.830
then you're probably better off using

00:31.830 --> 00:33.930
DALL-E or Midjourney.

00:33.930 --> 00:37.230
However, it's incredibly powerful if you do need

00:37.230 --> 00:38.671
to do more advanced stuff.

00:38.671 --> 00:41.370
It's really the most flexible model.

00:41.370 --> 00:44.520
That said, there are actually multiple models now,

00:44.520 --> 00:47.010
and some of them are less flexible than others.

00:47.010 --> 00:50.280
So I'm just gonna go, this is the Google Colab.

00:50.280 --> 00:53.190
I'm just gonna make a copy of this.

00:53.190 --> 00:55.500
The reason why, you know,

00:55.500 --> 00:58.050
actually lemme just do this way,

00:58.050 --> 01:02.490
the reason why this is interesting using Google Colab

01:02.490 --> 01:06.300
is that you typically need like a graphics card

01:06.300 --> 01:08.910
to actually run Stable Diffusion.

01:08.910 --> 01:10.740
It's a really big model

01:10.740 --> 01:14.790
and it's also incredibly computationally expensive.

01:14.790 --> 01:17.370
So unless you have, you know, you know how

01:17.370 --> 01:19.380
to set this up in the cloud

01:19.380 --> 01:22.080
or unless you have, you know, an M1 Mac

01:22.080 --> 01:24.870
or you know, PC with a high-end graphics card,

01:24.870 --> 01:26.190
you're not really gonna be able to run it.

01:26.190 --> 01:29.310
However, at least at the time of recording,

01:29.310 --> 01:32.130
Google is very generously gives you a GPU

01:32.130 --> 01:33.600
that you can use here.

01:33.600 --> 01:35.310
So I'm just gonna connect,

01:35.310 --> 01:37.560
I'm just gonna allocate some resource

01:37.560 --> 01:41.550
and then we're gonna run the pipeline.

01:41.550 --> 01:44.520
So once this is initialized, we're connected,

01:44.520 --> 01:47.527
this is an Nvidia command,

01:47.527 --> 01:50.400
and then we're just gonna hit like Shift and Enter here,

01:50.400 --> 01:51.630
or you can hit Play.

01:51.630 --> 01:54.390
And what this does is it finds out

01:54.390 --> 01:55.860
if we have a graphics card.

01:55.860 --> 01:58.770
So here we have like a Tesla T4 graphics card,

01:58.770 --> 02:01.320
which is really good for us to run.

02:01.320 --> 02:04.470
Then we install the Stable Diffusion library.

02:04.470 --> 02:07.743
So just running this when you are, you know,

02:08.925 --> 02:09.758
when you're running this,

02:09.758 --> 02:12.090
it's actually kind of running on Google's cloud.

02:12.090 --> 02:14.340
So it is not actually running on your local computer,

02:14.340 --> 02:17.490
which you know, is why we can kind of get away

02:17.490 --> 02:19.050
with running it even if, you know,

02:19.050 --> 02:21.360
I don't have a GPU on this.

02:21.360 --> 02:23.070
So that works.

02:23.070 --> 02:25.140
Now I talked to a little bit more about,

02:25.140 --> 02:26.190
there's multiple models.

02:26.190 --> 02:28.410
This is where it kind of gets confusing.

02:28.410 --> 02:33.180
So Stability AI is, gets a credit for Stable Diffusion

02:33.180 --> 02:35.280
because I think they're the most popular

02:35.280 --> 02:36.750
or the most well-funded company,

02:36.750 --> 02:40.571
but it's actually in collaboration with Runway ML,

02:40.571 --> 02:42.990
which also do a ton of like video stuff.

02:42.990 --> 02:43.830
Really interesting.

02:43.830 --> 02:45.750
And there was a lot of confusion.

02:45.750 --> 02:47.490
Still a lot of confusion.

02:47.490 --> 02:50.130
They also have, you know, they've got the 1.5 model,

02:50.130 --> 02:53.640
which is the old style, which is what a lot of people used,

02:53.640 --> 02:56.490
but then they released v2.0.

02:56.490 --> 02:58.230
And there's a lot of controversy there

02:58.230 --> 03:00.570
because frankly a lot of people

03:00.570 --> 03:03.180
are using v1.0 for -- and you know,

03:03.180 --> 03:05.580
for making pictures of celebrities and you know,

03:05.580 --> 03:07.230
for copying different artist styles.

03:07.230 --> 03:11.250
And then v2.0 has largely made it impossible

03:11.250 --> 03:12.900
for you to use it for that.

03:12.900 --> 03:15.600
But there's a huge increase in quality.

03:15.600 --> 03:18.030
So, you know, have a, you know, if you,

03:18.030 --> 03:20.466
you want flexibility, you know,

03:20.466 --> 03:22.380
the audience for Stable Diffusion

03:22.380 --> 03:23.580
wants a lot of flexibility,

03:23.580 --> 03:25.620
maybe not obviously for the best reasons

03:25.620 --> 03:27.630
and therefore there was a lot of controversy

03:27.630 --> 03:29.460
with the move to 2.1.

03:29.460 --> 03:31.800
But you know, just to kind of, you know, figure out

03:31.800 --> 03:34.080
what model you actually wanna run.

03:34.080 --> 03:39.080
Here we have 1.4, which is I think the original,

03:39.570 --> 03:41.010
the original Stable Diffusion.

03:41.010 --> 03:42.720
And that's what's running here.

03:42.720 --> 03:45.989
The actual use of it, it's not that different depending on,

03:45.989 --> 03:49.230
you know, what you're doing with it.

03:49.230 --> 03:50.790
And there's a lot of advanced functionality

03:50.790 --> 03:52.380
was kind of built on top of it,

03:52.380 --> 03:56.160
but v1.4 still works pretty well, you know,

03:56.160 --> 03:57.963
relative to other models.

03:58.860 --> 04:01.800
We've got, this is like pulling in all the different files,

04:01.800 --> 04:04.050
and it's downloading a bunch of stuff.

04:04.050 --> 04:05.850
It's pretty resource intensive,

04:05.850 --> 04:08.340
like I said, even with it, you know, GPU,

04:08.340 --> 04:12.423
it's gonna take a little bit of while to run.

04:13.320 --> 04:15.240
So I'm just gonna pause this,

04:15.240 --> 04:16.980
I'm gonna wait till it is downloaded

04:16.980 --> 04:19.020
and then I'll start again.

04:19.020 --> 04:22.440
Okay, this is downloaded only took a couple more minutes

04:22.440 --> 04:24.630
and now we're ready to start Inference.

04:24.630 --> 04:27.030
Inference is when you pass it a prompt

04:27.030 --> 04:29.730
and then it infers what image to create.

04:29.730 --> 04:32.182
So we're gonna create this pipe to CUDA

04:32.182 --> 04:36.360
CUDA is basically a framework for running on the GPU,

04:36.360 --> 04:38.970
so that's how we get it running faster,

04:38.970 --> 04:40.530
otherwise it would take all day.

04:40.530 --> 04:44.250
And then this is how you actually call Stable Diffusion.

04:44.250 --> 04:47.347
So passing in a prompt, this is the text,

04:47.347 --> 04:50.700
"a photograph of an astronaut riding a horse,"

04:50.700 --> 04:54.480
and then we're getting back the image, you know,

04:54.480 --> 04:57.150
it brings back the image in PIL format.

04:57.150 --> 05:00.750
And then you can just use Image Save, save the PNG,

05:00.750 --> 05:03.270
or if you're in Colab you can actually just, you know,

05:03.270 --> 05:05.790
type in image and then it will display

05:05.790 --> 05:06.990
for you, which is pretty cool.

05:06.990 --> 05:10.620
So just gonna run this and you can see how long it takes.

05:10.620 --> 05:11.820
So it's passing through.

05:11.820 --> 05:15.090
It's got a counter here. There we go.

05:15.090 --> 05:17.160
So it's relatively quick.

05:17.160 --> 05:20.143
It's not too bad when you have a GPU,

05:20.143 --> 05:22.643
I just think it doesn't run at all when you don't.

05:23.760 --> 05:25.350
Okay. And here we go.

05:25.350 --> 05:27.270
We have a different image from what we had before,

05:27.270 --> 05:28.706
obviously, because, you know,

05:28.706 --> 05:31.710
it is a probabilistic outcome, right?

05:31.710 --> 05:34.140
Like we get different images each time. Cool.

05:34.140 --> 05:37.080
So a few things you can do to play around with it.

05:37.080 --> 05:40.290
You can add a manual seed, which makes it not probabilistic.

05:40.290 --> 05:42.450
So we've added this manual seed

05:42.450 --> 05:44.160
of just like a random number,

05:44.160 --> 05:48.420
but as long as we use this number 1024 in the seed,

05:48.420 --> 05:49.860
if we run it again,

05:49.860 --> 05:52.080
we're gonna get the same thing again and again.

05:52.080 --> 05:54.270
So I just run that again

05:54.270 --> 05:56.340
and it's using the same seed

05:56.340 --> 05:58.320
and it's gonna come back with the same image.

05:58.320 --> 05:59.250
And the reason it does that

05:59.250 --> 06:01.260
is because of the way Stable Diffusion works, right?

06:01.260 --> 06:03.570
Like it's, it's a diffusion model.

06:03.570 --> 06:04.980
It's starting with random noise

06:04.980 --> 06:08.250
and then it's generating an image while this seed basically

06:08.250 --> 06:10.680
starts it with the same random noise each time.

06:10.680 --> 06:15.060
So it will generate the same image given the same prompt.

06:15.060 --> 06:19.050
Cool. You can also change the number of inference steps.

06:19.050 --> 06:22.560
So this makes a lower quality, you can tell it

06:22.560 --> 06:24.090
how many images to bring back.

06:24.090 --> 06:27.390
So this is how you assemble a grid

06:27.390 --> 06:30.420
and that creates a certain number of rows.

06:30.420 --> 06:32.730
So here we're generating three images.

06:32.730 --> 06:35.250
If we wanted, we could, you know, put like 10 images

06:35.250 --> 06:37.173
or nine images, whatever we want.

06:38.040 --> 06:40.950
And then, you know, we can generate

06:40.950 --> 06:43.590
like bigger grids like it's saying here.

06:43.590 --> 06:45.390
So this is, you know, you're starting to see some

06:45.390 --> 06:47.040
of the flexibility, you know,

06:47.040 --> 06:47.910
when you are using Midjourney,

06:47.910 --> 06:49.590
you just get a grid of four images, right?

06:49.590 --> 06:51.390
And this gives you a bigger grid.

06:51.390 --> 06:55.170
You could also run this with like,

06:55.170 --> 06:57.900
a smaller quality level, right?

06:57.900 --> 06:59.430
Like, so you could run with a smaller number

06:59.430 --> 07:02.043
of steps in order to find an image that you like.

07:03.060 --> 07:05.460
This is to generate a non square image

07:05.460 --> 07:07.860
if you wanna change the different styles.

07:07.860 --> 07:11.223
And you can really do a lot more than this actually,

07:12.235 --> 07:15.660
you know, you can train the model yourself

07:15.660 --> 07:17.490
even with DreamBooth.

07:17.490 --> 07:21.450
So flexibility is really the key with Stable Diffusion,

07:21.450 --> 07:23.853
you know, a program valid, your program,

07:24.780 --> 07:27.120
programmatic ability, I guess is,

07:27.120 --> 07:30.420
is what people are attracted to.

07:30.420 --> 07:32.610
Not necessarily that it's the best image model,

07:32.610 --> 07:34.050
but just really it's the one

07:34.050 --> 07:36.060
that you can build your whole business around, right?

07:36.060 --> 07:40.590
Like, you know, you can't rely necessarily on DALL-E

07:40.590 --> 07:44.040
or, you know, rely on Midjourney, you know,

07:44.040 --> 07:47.520
if they change something about the way it works,

07:47.520 --> 07:50.130
then you don't really have any flexibility.

07:50.130 --> 07:53.730
You can't really go back to the older model, right?

07:53.730 --> 07:56.310
You know, if they ban certain types of images,

07:56.310 --> 07:58.020
then you're stuck.

07:58.020 --> 08:01.380
So that's what really people like about this model

08:01.380 --> 08:02.493
and what's attractive.