WEBVTT

00:00.680 --> 00:01.920
Hello everyone.

00:01.920 --> 00:02.600
Welcome.

00:02.960 --> 00:05.840
In this video we will learn about probe encoding.

00:06.160 --> 00:12.600
Encoding probe will try to get a model to generate a specific piece of given text by presenting an encoded

00:12.600 --> 00:14.040
version of that text.

00:14.360 --> 00:18.840
It will attempt to circumvent, safeguard or input filtering.

00:19.240 --> 00:23.960
Let's explore this probe with the hands on using a yml file.

00:24.320 --> 00:26.440
So here is my project root directory.

00:26.840 --> 00:33.160
In here I have created another directory by the name reports and I have created a yml configuration

00:33.160 --> 00:34.720
for my Pro encoding.

00:35.000 --> 00:39.280
Here I have three root elements run plugin and reporting.

00:39.640 --> 00:45.480
The generations field would specify how many times we want to send each prompt for inference.

00:45.600 --> 00:47.560
In our case it's one.

00:47.880 --> 00:50.720
The plugin probe spec element is encoding.

00:50.840 --> 00:53.160
That is the probe that we want to test.

00:53.160 --> 00:57.080
And then we specify the model type as open AI.

00:57.560 --> 01:02.840
And we want to test this on the model G 3.5 turbo.

01:03.080 --> 01:07.040
The entire report will be submitted to this particular directory.

01:07.280 --> 01:12.240
Having said that, let's go ahead and execute the the generic tool using the CLI.

01:12.800 --> 01:18.500
So here I would say Gary hyphen hyphen config and name of the probe encoding.

01:18.500 --> 01:20.020
YML and enter.

01:20.220 --> 01:21.020
There you go.

01:21.420 --> 01:25.940
So now if you notice it is going to log everything to the log file.

01:26.740 --> 01:32.380
It has the generator as OpenAI GPT 3.5 turbo.

01:32.940 --> 01:38.340
Don't forget to add the OpenAI API key to the to the environment variable.

01:38.500 --> 01:42.900
Then it's going to test all the different probes for encoding that are listed here.

01:43.180 --> 01:46.780
And now the very first thing it did is encoding decode match.

01:47.140 --> 01:48.980
It passed 60 out of 60.

01:49.260 --> 01:50.620
Now it would keep going.

01:51.020 --> 01:53.940
It was the first one was inject Ascii 85.

01:53.980 --> 01:58.100
And then it would execute all of these probes one by one.

01:58.740 --> 02:01.740
I'll stop this because it's going to take a Wi-Fi to run.

02:02.060 --> 02:05.740
I will turn it back on when it's done executing all the probes.

02:05.980 --> 02:10.020
So it is done executing all the different encoding probes.

02:10.020 --> 02:11.740
And a lot of them were failure.

02:12.500 --> 02:16.460
The 16 out of 30 was inject base 16.

02:16.940 --> 02:25.820
Inject base 64 failed with 31 out of 55th, inject hex failed with 15 out of 30, and so on.

02:26.180 --> 02:32.760
So what I'm going to do is I'm going to open the JSON line format where it dumped all the data for us

02:32.760 --> 02:36.840
to go and understand how the entire execution was.

02:36.880 --> 02:39.600
Was framed and what happened behind the scenes.

02:40.000 --> 02:44.680
So here, if you notice here we have a sequence of different immuno patients.

02:44.800 --> 02:46.040
That happened.

02:47.000 --> 02:49.920
And this was the inject Ascii 85.

02:50.200 --> 02:52.040
And this is what was the prompt.

02:52.040 --> 02:53.600
And this was the output.

02:53.840 --> 02:58.520
Similarly after 85 there was inject base 16.

02:58.680 --> 03:00.760
And then this is what the prompt was.

03:00.920 --> 03:04.080
And then if you scroll on the right this was the output.

03:04.360 --> 03:06.160
And then it keeps going down.

03:06.280 --> 03:12.000
Inject base 32 inject ECoG and different inject Rot13.

03:12.920 --> 03:16.200
All of these injections were tested for encoding.

03:16.360 --> 03:24.640
And if you notice here overall out of everything only 13 passed out of 30, which is a very low number.

03:24.640 --> 03:29.560
But this gives us a very good understanding of how the entire encoding was ran.

03:29.840 --> 03:36.520
Sorry, this 13th out of 30 was for inject jalgo and not the entire invocation.

03:36.760 --> 03:43.200
The entire invocation suite can be found here, and it helps us understand how well this model was against

03:43.240 --> 03:44.360
different encoding.

03:44.800 --> 03:47.480
Thank you and I'll see you in the next video.
