WEBVTT

00:00.760 --> 00:01.960
Hello everyone!

00:01.960 --> 00:02.680
Welcome.

00:02.880 --> 00:06.560
In this video we will learn about procedures.

00:06.560 --> 00:07.400
Architecture.

00:08.200 --> 00:16.760
Presidio is engineered with a modular and extensible architecture, allowing it to be highly adaptable

00:16.760 --> 00:21.320
to various environments and specific data protection needs.

00:23.160 --> 00:26.800
It introduces two primary engines.

00:27.400 --> 00:30.200
The very first one is the analyzer engine.

00:30.960 --> 00:40.800
It scans text to identify PII entities using pattern matching NLP models and custom recognizers.

00:42.680 --> 00:54.160
The second engine is the Anonymizer engine, which transfers or transforms detected PII using configurable

00:54.160 --> 01:03.590
operators like masking, redaction, replacement, encryption or hashing based on your requirements.

01:05.710 --> 01:14.070
It also provides recognizer library which which is an extensible collection of built in detectors for

01:14.070 --> 01:15.630
common PII types.

01:16.870 --> 01:22.110
And it offers flexible deployment strategies such as Tylenol.

01:22.310 --> 01:31.150
Standalone services integrate into API gateways, and you can also embed it in the Python applications

01:32.030 --> 01:37.430
or run as containerized microservice in Kubernetes.

01:41.790 --> 01:45.670
So here I have a text as input.

01:45.710 --> 01:50.030
Hi, my name is David and my number is this.

01:50.950 --> 02:00.110
The request is first collected and sent to the PII analyzer using the API calls.

02:01.050 --> 02:01.610
Right.

02:03.410 --> 02:07.210
PII analyzer would extract the text features.

02:07.210 --> 02:13.890
Here are the artifacts that it collects as an entity as David.

02:14.890 --> 02:19.850
Tokens, lemmas and keywords as part of the input requests.

02:21.170 --> 02:27.650
Then once the extraction has happened, it fetches all the recognizers.

02:29.490 --> 02:37.890
In this case here it has credit card recognizer, crypto recognizer and other default recognizers that

02:38.690 --> 02:41.530
comes as part of the the library.

02:43.330 --> 02:50.490
Then it also runs the recognizer on the input text that was given.

02:52.610 --> 02:55.450
It identifies this as a phone number.

02:57.330 --> 03:00.290
It also identifies this as name.

03:02.240 --> 03:13.440
Once they are recognized as the r r patterns, it aggregates the results by saying entity types.

03:13.880 --> 03:16.520
The first one was person start and end.

03:17.520 --> 03:25.320
This is the confidence score entity type as phone number start and end and the score that determines

03:25.320 --> 03:27.240
the confidence score.

03:29.040 --> 03:42.560
And then once they are determined, it actually anonymizes the entities using the anonymization techniques.

03:42.760 --> 03:45.200
Where here it says person and the phone number.

03:46.840 --> 03:56.080
And then finally it sends the results back using R API's streams or data storage.

03:58.320 --> 03:58.960
Thank you.
