WEBVTT

00:01.240 --> 00:08.080
In this session, we are exploring how to extend guardrails capabilities with custom validators.

00:10.160 --> 00:18.760
Our goal is to create a secure environment that automatically identifies and handles personally identifiable

00:18.760 --> 00:27.200
information within our data, ensuring privacy and compliance.

00:27.680 --> 00:33.400
We begin by importing the necessary modules, including Ari, for regex operations.

00:39.320 --> 00:44.840
Validator and other related functions from guardrails for custom validation logic.

00:48.440 --> 00:57.000
In the context of guardrails, framework validation result, pass result, and failed result are pivotal

00:57.000 --> 01:03.000
components that determine how inputs are assessed against specified validation rules.

01:04.680 --> 01:08.520
These components play a crucial role in custom validators.

01:11.440 --> 01:16.440
Validation result act as a base class for the outcome of validation attempts.

01:17.200 --> 01:25.040
It encapsulates the result of applying a validator to a given piece of data, serving as a foundation

01:25.040 --> 01:30.400
for a more specific result types such as pass result and failed result.

01:32.920 --> 01:40.200
Pass result is a subclass of validation result that signifies a successful validation.

01:41.440 --> 01:48.280
When a pass result is returned by a validator, it indicates that the input data has met the validation

01:48.280 --> 01:55.080
criteria set by the validator and doesn't violate any of the rules.

01:58.760 --> 02:06.190
Conversely, failed result also extends validation result, but represents a failure in meeting the

02:06.190 --> 02:07.990
validation criteria.

02:09.150 --> 02:16.950
A failed result contains additional information, such as an error message detailing why the validation

02:16.950 --> 02:17.510
failed.

02:19.150 --> 02:27.030
This outcome is particularly useful for identifying inputs that contain errors, breach of policy,

02:28.030 --> 02:32.030
or, in our specific case, PII that shouldn't be disclosed.

02:37.270 --> 02:41.830
Next, we are defining a custom validator called Mi Filter.

02:46.070 --> 02:53.950
It leverages regular expression to identify patterns indicative of sensitive information, such as social

02:53.950 --> 02:55.190
security number.

02:57.510 --> 03:02.910
Taxpayer identification number, credit card numbers, and phone numbers.

03:03.790 --> 03:15.030
These patents cover a range of formats, acknowledging the diverse ways such information can be represented.

03:15.310 --> 03:19.150
The heart of PII filter lies in its validate method.

03:24.190 --> 03:28.910
Where the defined regex PRG is applied to the input value.

03:31.350 --> 03:34.790
If a match is found indicating the presence of PII.

03:37.630 --> 03:42.950
A failed result is written along with an error message highlighting the issue.

03:45.390 --> 03:56.430
If no PII is detected, a pass result is returned affirming the Input's compliance with privacy standards.

04:00.670 --> 04:12.180
Next, as we have defined the PII filter, we integrate our system and custom validator into a Pyramidic

04:12.180 --> 04:15.380
model text.

04:17.580 --> 04:23.300
This model serves to analyze text using our PII filter to flag any PII content.

04:26.380 --> 04:32.860
Demonstrating the seamless integration of custom validators in data processing workflows.

04:42.420 --> 04:52.260
With our validator and model in place, we define the prompt that needs to be passed to the LLM as an

04:52.260 --> 04:53.020
input.

04:54.980 --> 05:06.460
So here as you can see, this prompt defines the analyze the text for personal identifiable information.

05:08.060 --> 05:08.460
And.

05:09.820 --> 05:13.020
it has the text input field from text analysis.

05:15.380 --> 05:27.860
Now that we have defined the necessary input field objects, we will define the guard object that will

05:27.860 --> 05:40.940
enforce our custom PII detection rules on text inputs using a predefined prompt to guide the analysis.

05:42.700 --> 05:47.980
And then we are validating a simple text for PII.

05:49.500 --> 06:02.220
As you can see here, we are passing this dynamic text in the prompt which has a social security number

06:02.220 --> 06:15.450
defined into it and the information with the Parameters and the model maximum token and temperature

06:15.450 --> 06:18.970
is passed to OpenAI for execution.

06:19.850 --> 06:29.170
The guard object basically wraps the entire call to LM, and then validates the output and takes necessary

06:29.170 --> 06:31.170
actions as defined here.

06:31.530 --> 06:34.370
In this case, as.

06:38.970 --> 06:44.770
The prompt has defined an SSN, and if the same assertion.

06:44.770 --> 06:53.890
If the SSN is also present in the output generated by LM, the pi filter should fail.

06:56.050 --> 06:56.610
And.

06:58.850 --> 07:04.930
It should throw an exception based on our validation criteria.

07:05.650 --> 07:07.330
Let's run this example.

07:09.690 --> 07:13.650
And Check the output.

07:23.170 --> 07:30.210
As you can see, the validation failed for filled with errors because the test contains PII.

07:31.850 --> 07:40.370
Now if I just mask this information and run the program again.

07:55.610 --> 08:00.930
The validator should move the control to pass result.

08:02.370 --> 08:03.570
As you can see here.

08:06.290 --> 08:10.610
This is the prompt that the guardrail AI framework has prepared along

08:13.680 --> 08:20.880
With the output object, wherein we define the description and the format to be executed.

08:21.040 --> 08:25.640
The filtered or validator to be executed.

08:25.720 --> 08:37.440
This is our input text, and the raw LLM output contains the information of shown with masked social

08:37.440 --> 08:46.240
security number and hence the output validated is passed.

08:46.440 --> 08:48.200
There is no exception here.

08:50.800 --> 08:58.200
So the custom validators like my PII filter or any other filter for that matter, we define.

08:59.400 --> 09:04.800
Shows us how we can make rules that check our data just the way we need.

09:05.960 --> 09:15.960
It helps us keeping our data safe and meets specific requirements, making sure everything is in order.
