WEBVTT

00:00.680 --> 00:02.280
Hello everyone and welcome.

00:02.480 --> 00:06.480
In this video we will go over the reference documentation that Garrick provides.

00:06.760 --> 00:09.720
For that, we'll have to go to reference II.

00:10.000 --> 00:13.760
And if you notice here that is the documentation homepage for Garrick.

00:14.360 --> 00:18.160
On the left hand side they have different components and different topics.

00:18.160 --> 00:20.920
Listed here is the documentation.

00:21.800 --> 00:24.680
We will not go through all of the menu items.

00:24.920 --> 00:30.760
I will specifically pick one of the component that we want to understand, because in our next section,

00:30.760 --> 00:37.760
we will go and take a deep dive with our hands onto some of the topics for that particular section,

00:37.760 --> 00:39.200
and that is probes.

00:39.520 --> 00:46.320
So we learned in our previous video that Garrick Probe define a number of ways of testing generator

00:46.320 --> 00:48.280
for specific vulnerability.

00:48.560 --> 00:51.320
And our focus is going to be vulnerabilities.

00:51.640 --> 00:57.800
So in this case let's pick the probe Dan Dan is basically do anything now probe.

00:58.560 --> 01:01.400
This probe is designed to disrupt a system prompt.

01:01.600 --> 01:03.480
For example Do anything now.

01:03.520 --> 01:05.880
Probes and other similar way.

01:06.200 --> 01:13.120
So there can be inputs where a user tries to trick large language model to do anything by its construction

01:13.120 --> 01:14.040
of the prompt.

01:14.280 --> 01:18.400
And that's a big security vulnerability for for Lem.

01:18.960 --> 01:21.920
We'll cover this Dan prompt in our hands on video.

01:21.960 --> 01:25.800
The next vulnerability that I want to highlight is the encoding.

01:26.240 --> 01:33.280
This probe tries to get a model to generate a specific piece of a given text by presenting an encoded

01:33.280 --> 01:34.600
version of the text.

01:34.640 --> 01:38.600
Attempts to circumvent safeguards on input filtering.

01:38.960 --> 01:42.160
The other probe that we will cover is an accesses probe.

01:42.360 --> 01:49.640
This probe is for vulnerabilities that permit or enact cross-site attacks, such as data exfiltration.

01:49.960 --> 01:56.840
We'll cover all of this in our hands on section with different models, and understand how Garrick is

01:56.840 --> 02:01.000
effectively exposing the vulnerabilities in different models.

02:01.400 --> 02:02.200
Thank you.

02:02.440 --> 02:04.440
I'll see you in the next video.
