WEBVTT

00:00.120 --> 00:01.600
The final step is to create.

00:01.600 --> 00:06.920
An agent brings everything together now with the FastTrack as agent and tools manager.

00:07.200 --> 00:10.160
We're also adding conversational memory if there is any.

00:10.640 --> 00:12.960
So in this case here we have generator.

00:13.000 --> 00:16.080
We first have this agent as the base class.

00:16.440 --> 00:17.800
It needs generator.

00:18.200 --> 00:20.880
That is the generator we have defined up here.

00:21.240 --> 00:24.760
The FY 35 vision generator.

00:25.120 --> 00:29.360
Then here we have the prompt template which is the prompt template here.

00:30.280 --> 00:37.880
And then the tool that we have specified is a nutritional tool all the way up here which uses this description

00:37.880 --> 00:39.440
to invoke this tool.

00:39.560 --> 00:42.000
And then the conversational memory.

00:42.560 --> 00:49.080
And then we invoke this multimodal agent and we ask a question what is the fat content of the protein

00:49.080 --> 00:49.520
bar.

00:50.480 --> 00:53.280
And then we get the response back and we print this.

00:53.480 --> 00:55.240
So let's go ahead and run this.

00:55.680 --> 00:57.480
So we got the response back.

00:57.640 --> 00:59.360
Let's analyze this.

00:59.680 --> 01:04.120
So the thought was I need to find out the fat content of the protein bar.

01:04.360 --> 01:06.560
That was the question being asked.

01:06.960 --> 01:10.120
So now it invoked the tool which is nutrition tool here.

01:10.360 --> 01:16.100
It determined that this is the right tool for me to find the data, and then it passed the input to

01:16.140 --> 01:18.060
the tool as protein bar.

01:18.340 --> 01:24.140
It determined from this entire content that I need to pass protein bar as an input.

01:24.340 --> 01:27.260
And then internally this is the prompt.

01:27.620 --> 01:34.380
This is what the retrieval fetch from the in-memory vector storage with image that we went through before.

01:35.420 --> 01:42.740
Now once it processed all of this it there was another thought I have found the fat content of the protein

01:42.740 --> 01:43.140
bar.

01:43.460 --> 01:47.260
So what happened here is it found the necessary information.

01:47.260 --> 01:49.060
And I got the final answer here.

01:49.340 --> 01:52.940
As the protein bar has eight grams of fat per serving.

01:53.140 --> 01:58.420
And then this is how the entire execution of the agent went through.

01:59.340 --> 02:05.580
So going back to the media Amazon.com image, I noticed that total fat is eight gram, which matches

02:05.580 --> 02:11.340
our response from the model that the protein bar has eight grams of fat per serving.

02:11.660 --> 02:16.940
So now let's go ahead and try a more complex query that requires some multi-hop reasoning.

02:17.300 --> 02:20.020
So here I'm asking a little more complex question.

02:20.020 --> 02:22.830
That is which one has more protein?

02:23.390 --> 02:25.470
Protein bar or yogurt?

02:26.350 --> 02:32.310
So technically what the model has to do is is has to first get the protein information from the protein

02:32.310 --> 02:35.510
bar image, then get the information about the yogurt.

02:35.510 --> 02:39.510
And then compare the two and then give the result right.

02:39.910 --> 02:43.630
So let's see how well it performs in a multi-hop execution.

02:44.070 --> 02:51.110
The final answer is that the yogurt has 18g of protein per serving, which is more than protein bars

02:51.110 --> 02:52.910
14g per serving.

02:53.870 --> 02:57.750
So let's go first here and see how much of the protein the bar has.

02:57.950 --> 02:59.470
It's 14g.

02:59.630 --> 03:02.030
And then I'll open the image for yogurt.

03:02.430 --> 03:04.470
So this is another image.

03:04.630 --> 03:05.550
Let me zoom in.

03:05.550 --> 03:11.670
And here's a protein 18 gram versus the protein bar has 14 gram.

03:12.190 --> 03:15.550
So now let's go ahead and understand and analyze the output.

03:16.030 --> 03:21.710
The agent here said the thought I had to compare the protein content of protein bar and yogurt.

03:22.150 --> 03:27.510
So the first step it took was it took the protein nutritional tool as a tool and passed the protein

03:27.510 --> 03:33.530
bar and found the response as yogurt has 18g of protein per serving.

03:34.090 --> 03:37.450
Then it invoked the same tool for protein content of yogurt.

03:38.330 --> 03:42.850
Then for that, it figured out that it has 14g per serving.

03:43.130 --> 03:51.810
Now, in second step is I need to compare the protein content of a protein bar and yogurt where it compared

03:51.810 --> 03:59.370
the two, and then the yogurt has 18g of protein per serving, which is more than protein bars 14g per

03:59.370 --> 03:59.970
serving.

04:00.170 --> 04:04.090
So it did a lot of hops here by going first to the yogurt.

04:04.090 --> 04:11.170
Sorry, protein bar extracted the data and then went to the yogurt image, extracted the data, and

04:11.170 --> 04:15.010
then compared the two and give up with the final answer.

04:15.930 --> 04:21.810
We in this particular video, we went through the entire execution of a multi agentic application with

04:21.810 --> 04:23.850
Multi-hop and react reasoning.

04:24.290 --> 04:29.570
In our next video, we will go over executing multiple tools and have a follow up on this particular

04:29.570 --> 04:30.370
example.

04:30.570 --> 04:32.610
And then we'll have one more tool.

04:32.970 --> 04:38.290
Then the agent will be able to effectively invoke the correct tool that is needed for the task to be

04:38.290 --> 04:38.770
done.

04:39.690 --> 04:40.410
Thank you.

04:40.570 --> 04:42.050
I'll see you in the next.
