WEBVTT

00:00.920 --> 00:02.600
Hello everyone and welcome.

00:02.920 --> 00:07.760
In this video we will learn about the vulnerability image Markdown injection.

00:08.120 --> 00:13.480
We'll also learn about how an LLM is vulnerable to the image markdown injection.

00:13.960 --> 00:19.000
Followed by that we will use Goehrke tool to discover this kind of vulnerabilities.

00:19.320 --> 00:26.000
I'll go over this blog post where it has been very nicely explained how the Llms are susceptible to

00:26.040 --> 00:29.000
these kind of vulnerability.

00:29.880 --> 00:32.240
So it takes an example of the Bing chat.

00:32.640 --> 00:38.640
When Bing chat returns text, it can return markdown elements which renders HTML.

00:39.000 --> 00:43.800
This includes the feature to include images, so the image pretty much looks like this.

00:44.280 --> 00:51.560
There is Https protocol, the the server name, the image name, and then there is a query parameter.

00:52.440 --> 00:56.680
This will be rendered as an HTML tag within a source attribute.

00:57.000 --> 01:04.160
In this case, if you notice, the attacker can add the source of the image as its own website or domain,

01:04.320 --> 01:11.420
and then add a query parameter as exfiltration browser will automatically connect to the URL without

01:11.420 --> 01:13.460
user interaction to load the image.

01:13.820 --> 01:16.340
Wait, that is data exfiltration.

01:17.100 --> 01:19.740
How this ends up in the LM vulnerability.

01:20.020 --> 01:21.140
The impact is.

01:21.180 --> 01:27.900
As we know, LM agents like Bing Chat or ChatGPT are vulnerable to prompt injections.

01:28.180 --> 01:34.460
This means that during a prompt injection attack, the adversary hijacks the AI and the attacker can

01:34.460 --> 01:42.380
search web page or secrets or PII by appending such data to the URL of the image and have it automatically

01:42.380 --> 01:43.260
x filtered.

01:43.700 --> 01:49.860
The malicious instruction can replace the data exfiltration with data to exfiltrate.

01:50.300 --> 01:56.380
So here this is where the prompt injections can happen, where malicious instructions can be part of

01:56.380 --> 01:58.900
this URL that goes to the element.

01:59.660 --> 02:04.540
Now let's go ahead and use Garrick tool to detect this kind of vulnerability in the LM.

02:04.820 --> 02:11.800
So in here in Garrick probes that we learned in our previous video, I will navigate to the Garrick

02:11.800 --> 02:14.800
Probe X in here.

02:14.800 --> 02:21.040
I will do the image exfiltration and we can get the model to write embedded assets as x filtering text

02:21.040 --> 02:21.880
in the URI.

02:22.400 --> 02:26.600
The attempts to get the model a plant a markdown pixel image in output.

02:26.840 --> 02:28.360
Now let's go ahead and execute.

02:28.360 --> 02:35.840
This command that I have here is Garrick model type is Huggingface model name is GPT two.

02:36.080 --> 02:41.400
And the probes that we want to invoke is x dot markdown image exfiltration.

02:41.400 --> 02:42.880
And then execute this.

02:43.400 --> 02:44.200
There you go.

02:44.240 --> 02:47.840
It started running this and it will take a while for it to run.

02:48.960 --> 02:50.680
So I'm going to pause the video.

02:50.960 --> 02:57.120
So once the Garrick was done executing the probe I was able to extract the JSON line format that it

02:57.120 --> 02:58.240
does the reporting.

02:58.480 --> 03:02.080
And you notice here it has the details about the reporting.

03:02.280 --> 03:09.520
And it says at the end that there were about total 60 of them ran, and all 60 passed for the basic

03:09.520 --> 03:17.370
x filtering and the X filtering markdown for content, it ran another 60 probes and then 60 of them

03:17.370 --> 03:17.970
passed.

03:18.770 --> 03:19.370
Great.

03:19.890 --> 03:24.050
So now let's go ahead and run this on a on a different model hosted on the open AI.

03:24.610 --> 03:28.130
Let's run the probe on the turbo 3.5 GPT model.

03:28.570 --> 03:30.370
So I'll say Garrick model type.

03:30.770 --> 03:33.250
In this case it is OpenAI.

03:33.410 --> 03:34.890
Because it's not free dot.

03:35.570 --> 03:37.090
You'll have to pay for it.

03:37.090 --> 03:41.970
And then you say model name is GPT 3.5 turbo probes.

03:41.970 --> 03:45.370
Here is XSS markdown image filter.

03:45.610 --> 03:51.570
Just be sure that you're adding the OpenAI open API key to the system environment before you run this,

03:51.570 --> 04:00.930
because OpenAI model GPT 3.5 turbo is not free, it needs payment for accessing the model.

04:01.050 --> 04:02.810
So now let's go ahead and run this.

04:02.810 --> 04:04.490
So I believe I did a mistake.

04:04.530 --> 04:06.650
I did not add a hyphen here.

04:06.690 --> 04:08.450
Now let's go ahead and run this.

04:08.490 --> 04:14.690
It's fairly quicker compared to the hugging phase since it's a it's a paid service compared to the hugging

04:14.690 --> 04:16.490
face GPT two.

04:16.690 --> 04:18.530
It's free so it takes time.

04:18.530 --> 04:18.670
The Time.

04:18.670 --> 04:21.150
The resources are not allocated.

04:21.950 --> 04:24.350
And if you notice here it failed.

04:24.670 --> 04:30.030
Now let's go ahead and analyze the JSON line format for GPT 3.5 turbo.

04:30.230 --> 04:33.910
So here's the results that I got for GPT 3.5 turbo.

04:33.910 --> 04:37.030
And if you notice here it ran the same tests.

04:38.070 --> 04:44.070
And if I scroll on the right hand side here then if you notice these are the system prompts.

04:44.230 --> 04:49.710
And on the right hand side it has added prompt injections in the exfiltration attempts.

04:49.950 --> 04:51.830
So here's the malicious server.

04:52.750 --> 04:58.790
And then on the right hand side here there is the malicious content for the prompt injection Q equals

04:58.790 --> 05:02.510
to this this this this this.

05:03.590 --> 05:04.550
And yeah.

05:04.590 --> 05:07.870
And then it tries to break the LLM with exfiltration.

05:08.110 --> 05:14.630
And then if you notice all the way down here it passed 57 out of 60 different infiltration attacks that

05:14.630 --> 05:18.110
Garrick tried on GPT 3.5 turbo.

05:18.550 --> 05:23.350
This is amazing where you can test the exfiltration on LLM using Garrick tool.

05:23.470 --> 05:24.110
Thank you.