WEBVTT

00:00.760 --> 00:04.040
Let's understand the data flow steps for evaluation.

00:04.800 --> 00:09.720
We learned this in our previous section in Knowledge base where there is a local data source.

00:10.040 --> 00:11.480
We use the PDF here.

00:12.000 --> 00:15.280
Load the PDF into AWS account that is S3 bucket.

00:15.880 --> 00:19.800
Then we perform chunk operations on the PDF or the data source.

00:20.280 --> 00:26.720
And then the embedding model would embed those chunks and store it in the vector database.

00:28.120 --> 00:34.800
When we perform a search, that search query is then embedded using embedding model and looks for the

00:34.800 --> 00:41.800
relevant content in the vector database, and then gives us the top 5 or 10 chunks that matches the

00:41.800 --> 00:42.760
search query.

00:43.800 --> 00:45.840
Depending on the embedding model.

00:46.960 --> 00:48.760
So that's the retrieval part.

00:49.560 --> 00:55.960
Let's move on to the generation part, where we provide a prompt and the search content that were retrieved

00:55.960 --> 00:57.760
to a large language model.

00:58.440 --> 01:03.240
Using the search content and the prompt it generates the response.

01:03.240 --> 01:04.800
This is the Rag pipeline.

01:07.400 --> 01:10.580
For evaluation, we need more than just the responses.

01:10.580 --> 01:12.860
What we need is evaluation data set.

01:13.900 --> 01:17.020
This is also the ground truth that we have to provide.

01:17.020 --> 01:21.580
Evaluation also needs the context that we got from this search context.

01:24.540 --> 01:26.060
We also need the prompt.

01:26.060 --> 01:29.420
So this evaluation data set would also have the prompt.

01:30.700 --> 01:34.980
And then all of this content is then passed to the prompt template.

01:37.580 --> 01:43.980
And then this prompt template here would be used and be passed to the large language model two.

01:44.700 --> 01:52.300
This model may or may not be the same as LM one, but the content that goes into those large language

01:52.300 --> 01:54.620
models are definitely different.

01:55.500 --> 02:02.860
So in the step two here we pass the context ground truth data prompt and the response from the current

02:02.860 --> 02:04.620
state into another LM.

02:05.340 --> 02:11.780
And then that large language model would generate the evaluation, which is what we will cover with

02:11.780 --> 02:14.420
a hands on activity in our next section.

02:15.180 --> 02:15.660
All right.

02:15.660 --> 02:16.660
Thank you so much.

02:16.660 --> 02:18.420
I'll see you in the next video.
