WEBVTT

00:01.200 --> 00:02.160
Hello everyone.

00:02.960 --> 00:14.160
In this particular section we will be learning about Microsoft Presidio for secure and responsible AI.

00:15.560 --> 00:24.360
You will learn why protecting personally identifiable information, also known as PII, is critical

00:24.360 --> 00:34.280
when deploying large language models and also understand how Presidio detects and anonymizes sensitive

00:34.280 --> 00:45.160
data and gain practical skills to secure your AI pipelines against data leakage and compliance violations.

00:45.880 --> 00:52.400
Having said that, let's first understand what is Microsoft Presidio?

00:53.920 --> 01:04.980
Microsoft Presidio is an open source framework designed to identify, assess, and anonymize personally

01:04.980 --> 01:09.460
identifiable information across various data types.

01:13.580 --> 01:24.780
Presidios acts as a privacy guardrail, and it sits between users and AI systems, ensuring that sensitive

01:24.780 --> 01:27.660
data never reaches LM APIs.

01:27.860 --> 01:30.700
Training pipelines or system logs.

01:32.100 --> 01:41.020
Some of its key features are that it has 50 plus PII recognizers for different entity types.

01:42.420 --> 01:53.460
It has NLP powered detection for customizable roles, flexible anonymization strategies, language agnostic

01:53.460 --> 01:58.620
architecture, and it's easy to integrate with the existing pipelines.

02:00.670 --> 02:03.990
So why is Presidio needed?

02:05.230 --> 02:12.270
Presidio helps in detecting protections.

02:16.270 --> 02:19.110
So why is Presidio needed?

02:21.790 --> 02:26.990
From the main challenges of PII is unstructured data.

02:27.710 --> 02:37.750
LM processes vast amount of free form text, making PII identification and protection exceptionally

02:37.750 --> 02:39.870
difficult without specialized tools.

02:42.110 --> 02:56.310
There are regulatory compliances like GDPR and CcpA, and there are severe penalties for non-compliance

02:56.310 --> 03:00.850
which can include hefty fines and legal actions.

03:02.250 --> 03:13.610
And then privacy by design is part of the architecture that should be maintained for modern AI applications,

03:14.250 --> 03:19.610
where privacy controls should be the heart of the application.

03:20.410 --> 03:28.010
Procedure enables proactive data protection rather than being reactive for each responses.

03:30.610 --> 03:32.410
So what does Presidio solve?

03:33.130 --> 03:43.530
Presidio directly addresses many of the pressing data challenges that emerges when integrating AI systems

03:44.050 --> 03:45.570
into enterprise workflows.

03:45.850 --> 03:54.370
The very first one is the PII detection in the free form text procedure excels at identifying PII in

03:54.370 --> 03:56.730
unstructured data from various sources.

03:58.130 --> 04:00.670
It also prevents data exposure.

04:01.550 --> 04:12.310
It ensures sensitive data is anonymized before it reaches downstream systems like third party API's,

04:12.430 --> 04:16.030
logs, or less secure environments.

04:17.990 --> 04:27.030
It standardizes the protection, provides a consistent framework for protecting PII across diverse data

04:27.070 --> 04:39.150
sets, and it secures AI workflows, which helps enables the safe use of Llms for sensitive tasks by

04:39.550 --> 04:46.110
ensuring that PII is never processed, stored, or revealed by the models.

04:49.310 --> 04:59.120
Presidio is directly connected to the OWASp for those who are deeply concerned with security, procedure

04:59.120 --> 05:07.760
offers reassurance by directly addressing critical vulnerability identified by leading industry standards,

05:08.200 --> 05:10.840
particularly the OWASp top ten.

05:12.960 --> 05:20.680
It aligns with the security standards, preventing sensitive data disclosures and implementing proper

05:20.680 --> 05:21.960
security controls.

05:24.680 --> 05:30.120
OWASp top ten focuses on sensitive information disclosure.

05:30.720 --> 05:38.640
One of the most critical risk in the AI application by sanitizing inputs and outputs procedure mitigates

05:38.640 --> 05:47.200
the risk of models revealing PII through completions or training data leakage.

05:48.920 --> 05:57.720
Now that we learned about Presidio and what it means for LM and AI systems, let's go ahead and understand

05:57.720 --> 05:58.640
its architecture.