WEBVTT

00:00.080 --> 00:02.520
So now let's run through the user input.

00:02.520 --> 00:04.840
That is not part of the disallowed topic.

00:04.880 --> 00:08.200
I'm asking a user input as how to play ping pong.

00:08.440 --> 00:10.840
Let's execute this and see what happens.

00:11.160 --> 00:18.200
So if you see here I have captured all the info logs and I get all the details about the execution of

00:18.200 --> 00:19.520
the Nemo runtime.

00:19.920 --> 00:22.520
Here you can see I got a response back.

00:23.360 --> 00:26.560
You can play ping pong by hitting the ball with the paddle.

00:27.000 --> 00:33.000
Should always try to hit the ball in a way that makes it difficult for other players to run it, or

00:33.040 --> 00:34.200
to return it.

00:34.560 --> 00:37.680
Here is all the details that happens on the runtime.

00:37.840 --> 00:44.560
I have captured those details and I removed things or execution that was not important.

00:44.880 --> 00:51.800
Let's go over the info logs and understand how Nemo executes the workflow on the back end.

00:52.680 --> 00:57.840
So from the current runtime architecture diagram, we saw that it executes three main stages.

00:58.040 --> 01:05.960
One is to generate canonical user message, decide next steps and execute them and generate bot utterances.

01:06.400 --> 01:10.210
Each of the above stage can involve one or more calls to LM.

01:10.730 --> 01:11.810
How does this happen?

01:12.130 --> 01:17.890
So in our case let's analyze our case here that says generate canonical user message.

01:18.650 --> 01:21.530
That's the first step that gets executed here.

01:21.530 --> 01:22.930
Let's zoom in a little bit.

01:23.290 --> 01:25.530
Utterance user action finished.

01:25.730 --> 01:27.530
How to play ping pong.

01:28.010 --> 01:35.370
That's the question that was asked at the very first phase of generating user input in our case here.

01:35.530 --> 01:43.530
If you see this is the sample conversation that we provided in the Config.yml, the runtime picks up

01:43.530 --> 01:48.490
this entire stack and it creates a conversation between user and the bot.

01:49.130 --> 01:52.050
There's another flow here how the user talks.

01:52.490 --> 01:58.650
This is nothing other than the calling specification that we have given for canonical user, canonical

01:58.650 --> 02:02.450
form and the user utterances user.

02:02.490 --> 02:04.330
How much do I have to boil pasta?

02:04.450 --> 02:06.810
And then that is ask about cooking.

02:07.130 --> 02:08.370
How do I rob a bank?

02:08.850 --> 02:13.250
That is criminal activity and user ask about criminal activity.

02:13.490 --> 02:16.730
All of this is the calling details that we have specified.

02:17.650 --> 02:21.250
Now this is the conversation between the user and the bot.

02:21.610 --> 02:24.290
So it combines the one that we had above.

02:24.570 --> 02:30.970
It says User Express greeting bot express greeting which is here user express greeting.

02:31.130 --> 02:32.610
Bot express greeting.

02:32.650 --> 02:35.250
It replies ask about capabilities.

02:36.090 --> 02:37.930
Then it appends the user input.

02:37.930 --> 02:39.130
How to play ping pong.

02:39.450 --> 02:46.450
This is then the entire stack is then passed to Lem here, which is in our case open air platform.

02:46.450 --> 02:52.930
And then understanding the entire conversation, the Lem comes up with the canonical form as ask about

02:52.930 --> 02:53.650
sport.

02:53.650 --> 02:59.850
It's very important to understand this here is that this entire conversation is for the bot or Lem.

02:59.890 --> 03:07.050
To understand that the format here ask about sport has to match, about ask about travel, ask about

03:07.050 --> 03:09.370
cooking all the canonical forms.

03:09.370 --> 03:16.530
It analyzes this, and it understands that how to play ping pong is a user utterance that falls under

03:16.530 --> 03:17.970
canonical form.

03:18.010 --> 03:19.300
Ask about sports.

03:20.220 --> 03:23.580
Then the conversation goes on about the details about it.

03:23.580 --> 03:24.700
How to play tennis.

03:25.180 --> 03:28.340
Tennis is a sport into which four players hit the ball.

03:28.540 --> 03:30.660
Then basketball and football.

03:30.940 --> 03:36.780
This keeps on going, but the canonical form for user intent is derived as ask about sport.

03:37.300 --> 03:43.940
Once that is determined, it moves on to the next step where it's the phase two for generating the next

03:43.940 --> 03:44.540
step.

03:44.740 --> 03:49.180
So the next step is like decide what the next steps are and execute them.

03:50.180 --> 03:52.940
So here if you see we provide the prompt.

03:53.820 --> 03:58.140
And then this is how the conversation between the user and the bot can go.

03:58.580 --> 04:03.820
This is no different than sample conversation here without the user utterances.

04:04.180 --> 04:09.180
These are very much just the canonical form that they're on time framework uses.

04:09.580 --> 04:13.140
Then again the other canonical forms how is the bot?

04:13.220 --> 04:14.860
This is how the bot thinks.

04:15.940 --> 04:20.700
User express about grid about cooking kernel activity.

04:21.100 --> 04:23.140
This is how the conversation is.

04:23.260 --> 04:26.150
And it understands that this is how the flow would go.

04:26.430 --> 04:32.630
And then the third one here is current conversation between the user and the bot is a mixture of or

04:32.670 --> 04:39.790
a union of user express greeting bot respond to capabilities and you should ask about sports.

04:40.310 --> 04:43.910
This was a canonical form that was generated by the Lem.

04:44.350 --> 04:46.150
We add that or will not be.

04:46.150 --> 04:52.910
The framework adds that to the to the existing conversation, and then passes on to the OpenAI DaVinci

04:52.950 --> 04:53.510
model.

04:54.750 --> 04:56.070
And the bot response.

04:56.070 --> 04:59.430
Here is bot response for sports.

04:59.630 --> 05:03.190
That's the next step that it generated for the flow.

05:03.550 --> 05:09.110
So the first one is user ask about sport and then the next one is bot respond for sports.

05:09.430 --> 05:15.030
This is after understanding how the interaction is between the user and the bot, and then it also goes

05:15.070 --> 05:20.630
ahead and creates the own its own line of kolong flows that is printed here.

05:21.510 --> 05:27.230
Then it moves on to the third phase for generating the bot message or generating the bot utterances.

05:27.670 --> 05:34.910
This is again a very similar flow where we provide the prompt and then hear the conversation between

05:34.910 --> 05:36.150
the user and the bot.

05:36.350 --> 05:40.750
Is the config.yml file with the sample conversation?

05:42.230 --> 05:45.910
Then this is the current conversation between the user and the bot.

05:46.750 --> 05:51.350
Now it knows that it's a conversation that is ongoing between the bot and the user.

05:51.670 --> 05:54.150
So here express greetings.

05:54.270 --> 05:57.510
More express greetings capabilities.

05:57.670 --> 05:59.870
User ask about capabilities.

06:00.310 --> 06:03.790
But then here is the Kallang flow that that was generated.

06:04.110 --> 06:09.150
One is the user's asking or the user utterance is how to play ping pong.

06:09.430 --> 06:12.190
This is categorized as asking about sport.

06:12.590 --> 06:14.910
And then bot responses for sport.

06:15.110 --> 06:21.030
So it has to generate the response for sport here which is passed to the open AI platform.

06:21.390 --> 06:23.630
And here we get the response back.

06:23.990 --> 06:27.750
Ping pong is a sport played by two workforce people.

06:27.990 --> 06:31.150
And the details that come back from the interaction.

06:31.390 --> 06:37.310
So this is how the three stages of the Nemo guardrail runtime is executed behind the scenes.
