WEBVTT

00:00.800 --> 00:02.040
Hello everyone!

00:02.040 --> 00:07.960
In today's video we will cover the hallucination model that is Phi three Hallucination judge from grounded

00:07.960 --> 00:08.360
AI.

00:08.800 --> 00:11.600
It's an open source model hosted on the hugging face.

00:11.960 --> 00:14.880
The model would help us with the detection of hallucination.

00:15.160 --> 00:18.600
We'll do a hands on example and understand how this works.

00:19.360 --> 00:22.320
For that, I have opened a colab notebook.

00:22.560 --> 00:28.520
And if you notice here I have a pro account that helps me get a little more GPUs in memory.

00:28.800 --> 00:37.080
I went ahead in the runtime and I have 100 GPU as my desired compute that I want to run on, and I'm

00:37.080 --> 00:39.760
running this on Python three run type.

00:39.960 --> 00:41.600
So that's the setting that I have.

00:41.800 --> 00:45.480
And I went ahead and did pip install torch that.

00:46.520 --> 00:50.560
So it installed the torch on the notebook which takes a little longer time.

00:50.560 --> 00:51.760
So I did that already.

00:52.240 --> 00:58.640
The next thing that I'll have to do is I'll log in to hugging Face and provide the tokens.

00:59.040 --> 01:06.240
So in this case here I will add the hugging face notebook and then let's run this.

01:06.280 --> 01:08.640
In this case here, it asks me for the token.

01:08.800 --> 01:13.000
I'll have to copy the token from the hugging face account.

01:13.000 --> 01:14.200
So give me just a second.

01:15.480 --> 01:20.200
So I have copied my account and token and make sure that it's not a GitHub credential.

01:20.400 --> 01:23.160
You add that tokens invalid?

01:23.360 --> 01:25.640
Maybe on this one more time.

01:26.360 --> 01:28.240
Maybe I copied the wrong one.

01:28.400 --> 01:31.000
Let me go back and do this one more time.

01:32.440 --> 01:35.760
So now I copied my tokens and it logged in successfully.

01:36.080 --> 01:41.440
We need the hugging face token so that we can install or download the model onto the notebook.

01:41.760 --> 01:42.280
Great.

01:42.480 --> 01:43.880
So now this is done.

01:44.120 --> 01:47.320
Now the important part here is let's install theft.

01:47.400 --> 01:51.400
Now what is theft on a high level while this installs.

01:51.640 --> 01:58.280
Interesting, looks like it got installed quickly, but tier is a parameter efficient fine tuning of

01:58.280 --> 02:01.560
billion scale models on low resource hardware.

02:01.960 --> 02:06.120
It optimizes model parameters to work with low resource hardwares.

02:06.440 --> 02:07.920
So that's why we need Pemf.

02:08.240 --> 02:09.640
So I installed Pemf.

02:10.360 --> 02:12.960
And now let's go ahead and write some code here.

02:13.320 --> 02:16.560
So from Pemf model and pemf config.

02:17.000 --> 02:21.240
And then from transformers import for casual lm.

02:21.600 --> 02:24.440
So these are the two libraries that that we need.

02:24.600 --> 02:26.560
Pemf and Transformers.

02:26.680 --> 02:33.080
And then what I'm going to do is I'm going to download the model for that from Pemf from pre-trained

02:33.080 --> 02:33.720
model.

02:33.880 --> 02:39.080
I need this grounded I then from the auto model for casual M.

02:39.360 --> 02:46.320
We are doing this as a base model phi three mini 4k instruct and then path model for pre-trained.

02:46.480 --> 02:50.920
So okay it looks like I haven't defined them properly, so let me copy them.

02:51.240 --> 02:51.680
Okay.

02:52.040 --> 02:53.240
Auto model for.

02:53.800 --> 02:56.280
All right let me execute this.

02:56.320 --> 02:58.400
It looks like there's a problem.

02:58.520 --> 02:59.680
Wrong path.

02:59.960 --> 03:01.360
Import path.

03:01.770 --> 03:02.530
Okay.

03:03.570 --> 03:04.970
All right, all good.

03:04.970 --> 03:05.970
Let's run this.

03:06.090 --> 03:06.570
Okay.

03:07.050 --> 03:07.730
Perfect.

03:08.170 --> 03:10.930
So if you notice, it is going to download the model.

03:10.930 --> 03:13.810
It's a big model relatively big.

03:13.970 --> 03:15.370
So it might take a little while.

03:15.530 --> 03:17.330
I'll pause the video and I'll come back.

03:17.930 --> 03:18.570
All right.

03:19.050 --> 03:21.170
So it downloaded the model right.

03:21.610 --> 03:31.810
So in this case here see it was a slightly bigger model five six, seven and seven and close to 7.5GB,

03:31.810 --> 03:34.770
which is which is relatively big model to host.

03:35.130 --> 03:38.450
Let's write some code here that we can invoke on the model.

03:38.810 --> 03:41.810
So for that we have to first use a tokenizer.

03:42.170 --> 03:43.650
So what's a tokenizer?

03:43.850 --> 03:51.570
Tokenizers will convert your input to input all our natural language query to a language that the model

03:51.570 --> 03:52.730
understands.

03:53.170 --> 03:56.090
So for that we have to use Tokenizers create.

03:56.530 --> 03:58.170
So we have this already.

03:58.170 --> 04:06.090
I'm going to use a tokenizer from the model we just download which is 53 minute 4k instruct and then

04:06.090 --> 04:10.930
from transformers import tokenizer phi three mini 4k instruct.

04:10.930 --> 04:13.490
So we got the tokenizer which is ready.

04:13.810 --> 04:15.250
So in this case here.

04:15.450 --> 04:20.090
This is the text that I got from the hugging face examples that they have.

04:20.530 --> 04:23.770
So let me go here and share with you in this case here.

04:24.010 --> 04:26.210
And you see scroll downwards.

04:26.370 --> 04:28.090
You'll find this example here.

04:28.210 --> 04:30.610
It's not properly done in many ways.

04:30.810 --> 04:35.450
This example had to be tweaked in terms of how it's written.

04:35.770 --> 04:38.050
I have the code here that works well.

04:38.050 --> 04:41.170
So if you notice here this is this is the prompt.

04:41.530 --> 04:46.690
Your job is to evaluate whether a machine learning model has hallucination or not.

04:46.930 --> 04:48.490
And this is the prompt.

04:48.690 --> 04:50.690
This is the knowledge base.

04:50.970 --> 04:53.770
This is the context that we provide to the model.

04:54.170 --> 04:56.170
Walrus are the largest animal.

04:56.490 --> 05:00.410
The user input is a question what is the biggest mammal?

05:00.690 --> 05:02.850
And the model response is walrus.

05:03.250 --> 05:08.690
So if you notice here, there are four distinct elements that are getting injected to the model.

05:08.890 --> 05:10.210
One is a prompt.

05:10.490 --> 05:12.890
Overall, this is a prompt.

05:13.130 --> 05:20.290
There's an info that which has a model which has a knowledge base, user input and model response.

05:20.650 --> 05:26.810
And we are instructing the model that provide the response or provide model output of hallucination.

05:26.970 --> 05:31.170
Respond with only yes or no and return the prompt right.

05:31.250 --> 05:32.850
So this is the entire prompt.

05:33.050 --> 05:40.090
But technically this is divided into four distinct aspects of the input that goes to the model.

05:40.410 --> 05:41.690
So now we have that.

05:42.530 --> 05:45.330
What I'm going to do is okay we'll get the text.

05:45.450 --> 05:47.370
So we got the text right.

05:47.810 --> 05:51.330
The next thing that we want to do is pass create a message.

05:52.290 --> 05:54.210
So how does the message get created.

05:54.490 --> 05:57.170
So the message that gets created is like this.

05:57.410 --> 05:59.090
This is the message.

05:59.290 --> 06:01.050
The role is a user.

06:01.050 --> 06:04.650
And this is the context we got from the from the the prompt here.

06:04.690 --> 06:05.290
Great.

06:05.290 --> 06:06.610
Let's execute this.

06:07.090 --> 06:08.290
Oh, let's run this.

06:08.650 --> 06:09.370
Great.

06:09.370 --> 06:10.570
Everything worked well.

06:10.810 --> 06:12.970
The next phase is that we run the pipeline.

06:13.610 --> 06:17.210
So in that case here, we will have to generate the pipeline.

06:17.810 --> 06:20.850
So what I'm doing here is a pipeline of text generation.

06:20.850 --> 06:23.090
We have base model and a tokenizer.

06:23.250 --> 06:25.690
Let me bring this up from here and let's put it up here.

06:25.890 --> 06:26.410
Okay.

06:26.690 --> 06:28.850
So now if you notice we have a base model.

06:28.850 --> 06:31.170
We have a pipeline and tokenizer.

06:32.250 --> 06:33.930
So let me change this a little bit.

06:34.650 --> 06:35.770
We need a tokenizer.

06:35.770 --> 06:36.570
That's one.

06:36.930 --> 06:39.410
Then I have to import pipeline.

06:40.530 --> 06:42.850
So now we have the pipeline imported.

06:42.850 --> 06:48.210
We have the base model that we imported all the way up here which is the base model here.

06:48.570 --> 06:49.130
Right.

06:49.450 --> 06:52.050
And then we have the tokenizer.

06:53.010 --> 06:54.450
Everything is ready.

06:54.730 --> 06:57.010
Now we have to give a list of arguments.

06:57.170 --> 07:04.420
The generation arguments that would be required like how many tokens temperature and sample Temperature

07:04.420 --> 07:09.700
is basically how much variation do you want in your model output response?

07:09.900 --> 07:16.580
You can change the temperature and set that up to however much you want for for the model that works

07:16.580 --> 07:18.220
best for your inputs.

07:18.660 --> 07:20.580
And then the next last steps is to.

07:20.620 --> 07:23.580
The last step is to invoke the the pipeline.

07:23.820 --> 07:29.100
So in this case here we work in the pipeline here with the model here.

07:29.380 --> 07:33.340
Pass the messages and generation arguments that we just passed.

07:33.580 --> 07:35.180
And let's check the output.

07:35.460 --> 07:40.340
So the messages here is the list of messages that we have created.

07:40.460 --> 07:42.700
And then the generation arguments.

07:42.820 --> 07:44.020
Let's run this.

07:44.860 --> 07:47.180
Let's see how the response turns out to be.

07:48.100 --> 07:50.900
So it looks like we have some problem.

07:51.140 --> 07:56.340
GPU is available in the environment but more than the environment is passed.

07:56.740 --> 07:58.540
Let me try this one more time.

07:58.540 --> 08:05.260
So after playing around with different problems and solutions, I figured that I had to restart the

08:05.860 --> 08:08.020
the session and done everything again.

08:08.220 --> 08:09.300
This is what I did.

08:09.620 --> 08:14.660
Once I did, that colab was falling in place and I actually was able to run this.

08:15.140 --> 08:15.900
Perfect.

08:16.300 --> 08:21.460
And I do get the text here which says no, the hallucination did not happen.

08:21.900 --> 08:23.420
So that's what it is all about.

08:23.860 --> 08:25.460
Let's change this a little bit.

08:25.620 --> 08:27.660
Let's see what the user inputs.

08:27.820 --> 08:29.940
Smallest mammal right.

08:29.980 --> 08:31.780
In this case let's see.

08:32.180 --> 08:36.860
It did hallucinate here because clearly the largest mammal is walrus.

08:37.220 --> 08:39.540
The user input is about mammals.

08:39.540 --> 08:40.820
Smallest mammals.

08:41.140 --> 08:41.780
Sorry.

08:41.820 --> 08:43.500
And response was walrus.

08:43.900 --> 08:48.420
So definitely that deviated from the actual context that was provided.

08:49.020 --> 08:53.300
We were able to detect hallucination using a model based approach.

08:53.460 --> 09:02.340
And we tested this ran through the entire exercise of installing, doing hands on activity and understanding

09:02.340 --> 09:03.580
this one model.

09:04.380 --> 09:05.140
Thank you.