WEBVTT

00:00.800 --> 00:01.840
Hello everyone!

00:02.040 --> 00:07.920
In our previous video, we asked the question how to play ping pong and we got the LLM generated bot

00:07.920 --> 00:08.680
response.

00:08.880 --> 00:13.360
We went also went into the detail of how the Nemo guardrail runs time.

00:13.360 --> 00:15.600
Execute this on the back end.

00:15.880 --> 00:17.760
So now let me run this one more time.

00:18.640 --> 00:24.640
So I got the response back from the bot utterance saying important to sport that that involves hitting

00:24.640 --> 00:27.600
a ball back and forth with the racket and so forth.

00:28.000 --> 00:35.240
While doing so, if you noticed here, it took almost 5.23 seconds and it took three random calls to

00:35.280 --> 00:38.960
execute this stack in a real world application.

00:39.240 --> 00:44.440
5.23 seconds is a lot of time for guardrails to get executed.

00:44.760 --> 00:49.480
There are ways we can improvise this, and that is what today's topic will cover.

00:49.880 --> 00:54.000
So how we can improvise our LLM execution to run quickly.

00:54.840 --> 01:02.520
For that what I'll do is in our config file I have specified this rails configuration.

01:03.040 --> 01:09.960
What this rails configuration do is it provides a dialogue interaction and it says execute the LLM call

01:09.960 --> 01:11.720
in a single execution.

01:11.920 --> 01:14.880
So do not invoke LLM multiple times.

01:14.880 --> 01:16.360
And it's enabled as true.

01:16.560 --> 01:20.320
That is what the deduction I'm given to the Nemo runtime framework.

01:20.360 --> 01:22.040
And let's execute this.

01:23.080 --> 01:24.760
I got the response back.

01:24.760 --> 01:31.880
But now this time if you noticed it took only 1.93 seconds and just one call.

01:32.240 --> 01:37.280
This is great improvisation in terms of executing guardrails within this one call.

01:37.760 --> 01:44.120
Let's now go through the info logs and understand how this happens on the actual runtime execution.
