WEBVTT

00:00.270 --> 00:02.760
-: Okay, what is Meta LLaMA 2?

00:02.760 --> 00:05.880
So it's a new open source model by Meta,

00:05.880 --> 00:07.470
formerly known as Facebook,

00:07.470 --> 00:09.660
and it's the first really viable alternative

00:09.660 --> 00:12.060
to OpenAI that's open source.

00:12.060 --> 00:16.050
You know, this is the Stable Diffusion model kind of moment,

00:16.050 --> 00:18.000
but for LLMs.

00:18.000 --> 00:20.070
So when Stable Diffusion came out

00:20.070 --> 00:21.390
in the image generation space,

00:21.390 --> 00:23.580
it changed a lot about what was possible,

00:23.580 --> 00:25.500
and how many, you know, businesses you could build

00:25.500 --> 00:27.600
on top of these AI tools,

00:27.600 --> 00:29.310
so really exciting to see this.

00:29.310 --> 00:32.370
It's a transformer model, just like GPT.

00:32.370 --> 00:34.890
A lot of the work in this space has been built

00:34.890 --> 00:39.150
on top of Google's work on Attention is All You Need

00:39.150 --> 00:40.950
that introduced the transformer model,

00:40.950 --> 00:44.910
as well as OpenAI's work with GPT-2, which was open source,

00:44.910 --> 00:47.940
and the results are pretty good, right?

00:47.940 --> 00:52.440
Initially, it was created by Meta, and then open-sourced,

00:52.440 --> 00:54.270
but they didn't open source the weights,

00:54.270 --> 00:57.210
only researchers got those, and then somebody leaked them.

00:57.210 --> 01:00.270
With LLaMA 2, Meta actually made the decision

01:00.270 --> 01:01.560
to fully open source it,

01:01.560 --> 01:04.650
so there is possibility of a commercial license, really,

01:04.650 --> 01:07.560
for the first time, and it's really good.

01:07.560 --> 01:09.990
You can actually fine-tune it, so that's,

01:09.990 --> 01:14.010
you can't currently fine-tune anything in the OpenAI API,

01:14.010 --> 01:17.135
that's not available right now,

01:17.135 --> 01:18.960
'cause they deprecated GPT-3,

01:18.960 --> 01:22.950
but they say that GPT-4 and GPT-3.5 fine-tuning

01:22.950 --> 01:25.470
is coming soon, but you've at their mercy,

01:25.470 --> 01:27.750
whereas with LLaMA 2, people are fine-tuning it every day,

01:27.750 --> 01:29.850
like university students are doing it,

01:29.850 --> 01:31.770
people are just doing it in Jupyter Notebooks,

01:31.770 --> 01:32.820
it's pretty nuts.

01:32.820 --> 01:34.980
and one way you can try LLaMA 2

01:34.980 --> 01:39.980
without knowing how to code is you can go to chat.nbox.ai,

01:41.010 --> 01:43.740
so they very generously host this for free,

01:43.740 --> 01:46.110
and you can try it out.

01:46.110 --> 01:48.510
This is how I've been playing around with it.

01:48.510 --> 01:51.210
The really interesting thing here is that

01:51.210 --> 01:54.180
it needs a lot more prompt engineering, in my experience,

01:54.180 --> 01:56.040
but here, this is a pretty robust prompt

01:56.040 --> 01:59.100
that works well, even on GPT 3.5,

01:59.100 --> 02:02.970
but for LLaMA 2, I tend to get a lot of hallucination,

02:02.970 --> 02:05.850
and it keeps talking, like, it doesn't stop

02:05.850 --> 02:07.830
at one product name, so yeah,

02:07.830 --> 02:10.050
you can play around with it and see how it goes.

02:10.050 --> 02:12.840
It's also possible to download it and use it locally

02:12.840 --> 02:14.880
if you have a GPU on your computer,

02:14.880 --> 02:18.243
like a Mac M2, or like, a gaming PC.

02:19.260 --> 02:21.630
In terms of performance, it is, I would say,

02:21.630 --> 02:24.870
on par with 3.5, it's not quite there,

02:24.870 --> 02:28.140
and specifically, it's the fine-tuned version,

02:28.140 --> 02:30.690
so the Vicuna is pretty decent.

02:30.690 --> 02:33.960
So it has two of the top slots in the model rankings

02:33.960 --> 02:36.750
by LMSYS on Hugging Face.

02:36.750 --> 02:38.370
It's not quite there yet.

02:38.370 --> 02:41.847
It is definitely one of the best open source models.

02:41.847 --> 02:46.170
MPT-30B-Chat model is also really good,

02:46.170 --> 02:48.570
but it depends on the ratings and the use cases,

02:48.570 --> 02:51.180
and the really cool thing here is that often,

02:51.180 --> 02:54.030
like a fine-tuned version of LLaMA 2

02:54.030 --> 02:57.270
will beat GPT-4 at specific tasks

02:57.270 --> 02:58.860
if it's been fine-tuned for that task,

02:58.860 --> 03:02.970
so this is really great for specific use cases

03:02.970 --> 03:06.030
rather than more of a general purpose AI.

03:06.030 --> 03:08.190
Because it's open source, the main benefit

03:08.190 --> 03:09.150
is it's free, right?

03:09.150 --> 03:10.830
Like, apart from the cost of compute,

03:10.830 --> 03:12.150
you don't have to pay any credits.

03:12.150 --> 03:13.800
You can run it on your local computer,

03:13.800 --> 03:16.320
you can run it on your own servers,

03:16.320 --> 03:19.650
and you can build your own UX around it,

03:19.650 --> 03:22.140
that's completely, you know, up to you.

03:22.140 --> 03:23.940
You can also inspect a lot more in terms

03:23.940 --> 03:26.130
of what's going on in the model.

03:26.130 --> 03:29.700
The main use cases, why would you need to use LLaMA 2

03:29.700 --> 03:32.370
when it's objectively worse than GPT-4?

03:32.370 --> 03:34.530
I would say, one is privacy or data protection.

03:34.530 --> 03:37.080
Many in the enterprise can't use OpenAI,

03:37.080 --> 03:39.210
or like, their companies have banned them from doing it.

03:39.210 --> 03:41.670
They don't want sensitive data going to the API.

03:41.670 --> 03:44.520
OpenAI, to be honest, like they said,

03:44.520 --> 03:47.100
that they keep the API data private.

03:47.100 --> 03:48.900
Their startups, hard to know.

03:48.900 --> 03:52.170
I think Microsoft is actually pushing in this space,

03:52.170 --> 03:55.830
and offering GPT through Azure,

03:55.830 --> 03:57.420
which maybe is a little bit more private,

03:57.420 --> 04:00.120
and they might have some self-hosted options available,

04:00.120 --> 04:03.750
but even in that case, you might not wanna fully trust that.

04:03.750 --> 04:05.460
So if you're building your own enterprise version

04:05.460 --> 04:06.900
of ChatGPT,

04:06.900 --> 04:09.120
you might want to build on LLaMA 2.

04:09.120 --> 04:12.420
Also, fine-tuning, if you have less than 200 examples,

04:12.420 --> 04:14.640
typically, prompt engineering just beats fine-tuning

04:14.640 --> 04:16.050
for any given task,

04:16.050 --> 04:18.390
but once you get more than 200 examples,

04:18.390 --> 04:20.940
so once you start to get more data from your users

04:20.940 --> 04:23.400
of what's good, what's bad for a specific task,

04:23.400 --> 04:27.600
then it starts to become worth exploring fine-tuning,

04:27.600 --> 04:29.190
you know, that's only really possible

04:29.190 --> 04:31.260
on LLaMA 2 at this date.

04:31.260 --> 04:34.290
You can also build a business on LLaMA 2.

04:34.290 --> 04:36.810
Because it's open source, you can run it on your own server,

04:36.810 --> 04:38.370
you can have all the code yourself,

04:38.370 --> 04:40.410
you're not subject to any limits, right?

04:40.410 --> 04:42.930
You don't have, you can't be shut down by OpenAI

04:42.930 --> 04:45.840
because your users put some spam requests in there,

04:45.840 --> 04:49.380
or you can't run up a huge bill in terms of your credits,

04:49.380 --> 04:52.140
apart from just the cost of running your own servers.

04:52.140 --> 04:53.340
It's worth checking out.

04:53.340 --> 04:55.380
I would say, I'm cautiously optimistic.

04:55.380 --> 04:56.993
I really want open source to work,

04:56.993 --> 04:58.860
because in the image generation space,

04:58.860 --> 05:01.080
I'm a heavy user of Stable Diffusion,

05:01.080 --> 05:04.080
and I want there to be the same choice

05:04.080 --> 05:06.900
in the large language model space as well,

05:06.900 --> 05:08.463
so, really excited for this.