WEBVTT

00:00.920 --> 00:02.600
Hello everyone and welcome.

00:02.800 --> 00:08.520
In this video we will install Garrick and go over the CLI interface that Garrick offers.

00:08.680 --> 00:14.280
Learn how to use Garrick, and also go over a couple of models that we will use in this section, where

00:14.280 --> 00:18.040
we will run Garrick and find the vulnerabilities on those models.

00:18.480 --> 00:23.240
So the very first thing that we would want to do is install Garrick on the CLI.

00:23.640 --> 00:30.680
So for that I have opened a PyCharm and I have a terminal window open with windows PowerShell.

00:31.120 --> 00:33.800
I have created a directory called Garrick.

00:33.800 --> 00:37.440
And in here I'll go ahead and do pip install Garrick.

00:38.080 --> 00:43.480
So if you notice it went through all the installation very quickly because I already ran this before.

00:43.920 --> 00:50.000
However, if you run it for the first time, it will take at least 5 to 10 minutes for you to run on

00:50.000 --> 00:50.920
your machine.

00:51.160 --> 00:54.840
Very first command that I would run is Garrick List probe.

00:55.160 --> 00:58.320
So in this scenario I'll have to put two hyphens.

00:58.520 --> 01:00.560
Looks like I did something wrong.

01:01.080 --> 01:05.720
So the command that I have to type is Garrick hyphen underscore probes.

01:06.280 --> 01:10.680
This will list all the different probes that are offered for testing vulnerability on.

01:11.920 --> 01:13.560
So let's come up a little bit.

01:13.560 --> 01:15.200
And here okay.

01:15.400 --> 01:22.760
So here you have all the different vulnerabilities like Dan checking spam encoding not answer file formats.

01:22.800 --> 01:25.850
Latent injection leak reply malware.

01:26.050 --> 01:32.810
So many of them prompt inject toxic prompts snowball and suffix topics XSS.

01:33.090 --> 01:36.050
These are all inbuilt and the list keeps on growing.

01:36.410 --> 01:41.250
So now we will use a couple of prompt probes here and see how we can use it for our use case.

01:41.410 --> 01:47.730
Now before that, let me go ahead and open the model card for the model that we want to test these probes

01:47.730 --> 01:48.130
on.

01:48.330 --> 01:55.490
So one of the models that we want to test here is OpenAI's GPT two, which was the predecessor of GPT

01:55.530 --> 01:56.010
three.

01:56.410 --> 02:01.010
And it's available on Huggingface and it's freely available.

02:01.770 --> 02:03.290
You don't have to pay for it.

02:03.330 --> 02:07.810
If you notice here, it has all the details about the model.

02:07.810 --> 02:14.010
And what I want to highlight here is the fact that this model is trained on public data, and it has

02:14.010 --> 02:17.330
all the different vulnerabilities that you may not know.

02:17.890 --> 02:19.690
This model has been released.

02:19.930 --> 02:25.770
There's it's known a lot of out filter content from the internet, which is far from neutral.

02:26.650 --> 02:32.290
So we would run our probes against this model and learn how we can expose the vulnerability here.

02:32.730 --> 02:43.570
And we'll also go ahead and use the ChatGPT 3.5 and 4.0 models, where we can test the vulnerability

02:43.570 --> 02:44.930
on those models.

02:45.290 --> 02:46.250
Thank you.

02:46.250 --> 02:48.690
And I will see you in the next video.