WEBVTT

00:00.800 --> 00:01.960
Hello everyone!

00:02.480 --> 00:07.800
In today's video we will learn about techniques to detect hallucination.

00:08.920 --> 00:11.480
So there are metrics to run techniques.

00:11.840 --> 00:16.360
One of the very prominent ones is faithfulness, also known as hallucination.

00:16.360 --> 00:18.200
In many documentations.

00:18.640 --> 00:24.240
This measures the factual consistency of the generated answer against the given context.

00:24.560 --> 00:28.200
It is calculated from the answer and retrieved context.

00:28.520 --> 00:32.160
The answer is scaled in a 0 to 1 range.

00:33.120 --> 00:37.280
Now let's understand how this works in the actual Rag pipeline.

00:38.200 --> 00:45.840
So in my previous video where I have explained Rag, there are three inputs context, user input and

00:45.840 --> 00:48.480
the prompt that are sent over to LMH.

00:48.880 --> 00:53.960
And LMDh uses these three inputs to generate the LLM answer.

00:54.800 --> 01:01.960
Once you get this answer, we use the context from the initial pipeline and the answer and we invoke

01:02.000 --> 01:08.270
another model, which is a hallucination model to calculate the faithfulness of the answer.

01:09.270 --> 01:13.390
This model is then responsible for outputting the score.

01:14.790 --> 01:17.670
This is essentially the faithfulness matrix.

01:17.950 --> 01:21.550
We will learn this with hands on experience in our next video.

01:22.710 --> 01:23.670
Next is answer.

01:23.670 --> 01:24.550
Relevance.

01:24.910 --> 01:32.870
The evaluation matrix answer relevancy focuses on assessing how pertinent the generated answer is to

01:32.910 --> 01:34.070
the given prompt.

01:34.550 --> 01:41.630
Lower scores are assigned to answers that are incomplete or contain redundant information, and higher

01:41.630 --> 01:44.070
scores indicate better relevancy.

01:44.950 --> 01:47.750
This is a typical Rag pipeline.

01:47.990 --> 01:51.550
Again, context, user input and prompt.

01:51.870 --> 01:54.470
They're fed to an end.

01:55.230 --> 01:56.510
Generates an answer.

01:57.230 --> 01:57.790
Great.

01:58.070 --> 02:03.510
The answer and the prompt are then fed into a hallucination model here for answer relevancy.

02:03.510 --> 02:05.140
And that outputs the score.

02:06.100 --> 02:06.420
Wow.

02:06.900 --> 02:07.380
Okay.

02:07.420 --> 02:12.340
So now let's understand what is the model that is used to detect hallucination.

02:13.260 --> 02:18.140
There are different fine tuned models designed for detecting hallucination in LM.

02:18.500 --> 02:24.940
They're particularly useful in the context of building retrieval augmented generation applications,

02:24.940 --> 02:33.340
where a set of facts is summarized by an LM model, and the model can be used to measure the extent

02:33.340 --> 02:37.340
to which the summary is factually consistent with the facts.

02:38.340 --> 02:44.100
There are a couple of models that are available as open source models on hugging face.

02:44.900 --> 02:53.540
One is V3 hallucination judge and another one is hallucination evaluation model from victory.

02:54.420 --> 03:00.500
We will take a deep dive into each one of them in our next video, and understand how we can detect

03:00.540 --> 03:01.700
hallucination.

03:02.100 --> 03:04.860
Thank you and I'll see you in the next video.
