WEBVTT

00:00.040 --> 00:03.600
I'll quickly go through the documentation from the meta of llama part three.

00:04.000 --> 00:07.760
So llama three is built on the capability of llama two.

00:08.080 --> 00:13.440
Added three new categories information election code interpreter abuse.

00:13.560 --> 00:16.960
It's a multilingual introduces new prompt format.

00:17.400 --> 00:19.080
So here's the prompt format.

00:19.120 --> 00:22.760
You begin the text specifies the start of the prompt.

00:23.120 --> 00:29.440
These tokens enclose the role of particular message, the possible roles of user and assistant, and

00:29.440 --> 00:37.520
header ID and end of the turn is represents when the LLM determines it finishes interacting with the

00:37.520 --> 00:40.320
user message that initiated its response.

00:40.720 --> 00:48.040
A properly constructed prompt contains a number of sections delimited by tags such as begin, answer

00:48.040 --> 00:51.760
categories, begin, and begin conversation.

00:52.280 --> 00:57.600
They are not special tokens, but are normal text in the prompt which enable the model to parse the

00:57.600 --> 00:58.800
prompt properly.

00:59.640 --> 01:06.450
So in this case here, if you see this is a prompt, begin of the text start header ID and header ID

01:06.490 --> 01:07.330
for the user.

01:07.570 --> 01:10.370
And then we say here this is the task.

01:10.690 --> 01:18.330
Check if there is unsafe content and then you specify begin unsafe content categories and then you end

01:18.370 --> 01:19.810
the unsafe categories.

01:19.970 --> 01:21.570
You begin the conversation.

01:21.810 --> 01:26.010
That is the conversation message their user and agent has.

01:26.290 --> 01:32.530
You end the conversation and at the end you provide your safety assessments for only the last role in

01:32.530 --> 01:34.010
the above conversation.

01:34.330 --> 01:37.690
First line must read a safe or unsafe.

01:37.930 --> 01:44.290
If unsafe, then include the comma separated list of the valid related categories, and then you can

01:44.290 --> 01:47.090
specify all the valid categories here.

01:47.490 --> 01:53.290
If there are 14 of them, and you can also give more details in the prompt like what does violent crime

01:53.290 --> 01:57.570
looks like and what does nonviolent crimes look like?

01:57.770 --> 02:05.060
And all of that for all 14 different categories and a complete example would be something like this.

02:05.300 --> 02:06.620
Here is the prompt.

02:06.780 --> 02:08.020
There is a task.

02:08.180 --> 02:14.180
There is begin unsafe content categories which is here high level categories.

02:14.180 --> 02:16.980
And then here is the conversation that started.

02:17.620 --> 02:19.740
Here is the conversation that ended.

02:19.740 --> 02:23.180
And then provide your safety assessment for only the last.

02:23.180 --> 02:25.180
And then you give the last tag here.

02:25.180 --> 02:29.260
And the end of turns start header assistant and end header.

02:29.420 --> 02:31.380
And then you'll get the response back.

02:31.980 --> 02:35.460
This is how the entire prompt of llama guard is constructed.

02:35.660 --> 02:37.900
You can go through this documentation.

02:38.780 --> 02:40.220
It's fairly straightforward.

02:40.220 --> 02:46.420
Just one thing that I want to highlight here is that guardrails be applied to both input and output

02:46.420 --> 02:47.300
of the model.

02:47.700 --> 02:51.500
So one of the user inputs and other for the agent output.

02:51.500 --> 02:54.220
So it's not just for input content moderation.

02:54.340 --> 03:00.620
It's also for output content moderation where the role place value can have both user and agents as

03:00.620 --> 03:03.220
part of the evaluation that you want to perform.

03:03.340 --> 03:04.060
Thank you.
