WEBVTT

00:00.080 --> 00:06.720
In this section, we shift from using tools to designing them correctly for production grade LM systems.

00:07.400 --> 00:13.840
When LMS interact with real infrastructure, the quality of tool design becomes the single most important

00:13.840 --> 00:15.920
factor in system reliability.

00:16.640 --> 00:19.080
As introduced on the opening slide of this deck.

00:19.400 --> 00:24.120
Tools are the bridge between probabilistic reasoning and deterministic execution.

00:24.680 --> 00:27.040
Language models reason in probabilities.

00:27.280 --> 00:31.080
They infer intent, guess parameters, and predict outputs.

00:31.600 --> 00:37.960
Production systems, however, require exact inputs, strict validation, and predictable behavior.

00:38.560 --> 00:42.600
Tool design is where these two worlds meet and where many systems fail.

00:42.600 --> 00:47.760
If guardrails are missing, poorly designed tools lead to cascading problems.

00:48.120 --> 00:54.240
Invalid function calls that break workflows, silent failures that are hard to debug, and security

00:54.240 --> 00:57.200
vulnerabilities that expose sensitive systems.

00:57.680 --> 01:01.430
No amount of prompt engineering can fix weak tool boundaries.

01:01.550 --> 01:08.510
The key insight to carry forward is this tool design is not an implementation detail.

01:08.830 --> 01:11.630
It is a core architectural responsibility.

01:12.230 --> 01:19.110
If you want Llms to operate safely and reliably in production, your tools must enforce determinism,

01:19.110 --> 01:22.470
validation, and clarity at every boundary.

01:22.990 --> 01:26.710
This slide reinforces a critical principle tool.

01:26.710 --> 01:31.110
Using Llms are only as powerful and safe as the tools they can access.

01:31.550 --> 01:38.030
As shown on page two of the deck, the quality of your tool design directly determines system reliability.

01:38.790 --> 01:42.430
When tools are poorly designed, failures compound quickly.

01:42.830 --> 01:45.950
An invalid parameter can break an entire workflow.

01:46.510 --> 01:50.270
An ambiguous API can cause the model to behave unpredictably.

01:50.790 --> 01:56.030
A missing validation step can turn a harmless request into a security incident.

01:56.750 --> 02:03.630
The core challenge is that llms reason probabilistically, Even with perfect prompts, they will occasionally

02:03.630 --> 02:07.470
generate malformed, incomplete or inappropriate inputs.

02:07.790 --> 02:08.950
This is not a bug.

02:09.190 --> 02:11.710
It is a property of probabilistic systems.

02:11.990 --> 02:14.710
That is why tools must enforce determinism.

02:15.110 --> 02:18.950
They are the final authority that decides what is allowed to execute.

02:19.310 --> 02:24.470
Validation constraints and clear interfaces are not optional safeguards.

02:24.830 --> 02:26.630
They are mandatory defenses.

02:26.950 --> 02:33.310
The bridge between probabilistic reasoning and deterministic execution is where production systems either

02:33.310 --> 02:34.670
succeed or fail.

02:35.150 --> 02:37.510
Tool design is that bridge.

02:37.870 --> 02:44.550
In real production environments, tools are not abstract functions as outlined on page three of the

02:44.550 --> 02:44.990
deck.

02:45.230 --> 02:47.430
They are concrete system components.

02:47.710 --> 02:54.670
Rest APIs, microservices, and protected internal services, each with real world constraints.

02:55.110 --> 03:00.220
When exposing an API to an LLM, that API becomes a callable tool.

03:00.660 --> 03:06.380
This means every endpoint you expose directly expands the model's operational capabilities.

03:07.020 --> 03:10.780
Poorly scoped APIs give Llms too much power.

03:11.220 --> 03:14.180
Poorly defined APIs confuse the model.

03:14.700 --> 03:20.940
Each tool definition must include four essential components a descriptive function name, a clear natural

03:20.940 --> 03:25.940
language description, a strict input schema, and a well-defined output schema.

03:26.460 --> 03:30.660
These elements guide the model's reasoning and enable safe execution.

03:31.100 --> 03:37.020
The best practice highlighted here is crucial design APIs for machines, not humans.

03:37.340 --> 03:43.980
Prioritize passability over readability, structure over flexibility, and validation over convenience.

03:44.380 --> 03:49.580
Human friendly shortcuts often become failure points when llms operate at scale.

03:51.260 --> 03:58.020
This slide introduces concrete design principles for building safe and usable tool interfaces.

03:58.580 --> 04:00.530
As shown on page four of the deck.

04:00.690 --> 04:06.090
Clarity comes from narrow scope, explicit parameters, and purpose specific design.

04:06.570 --> 04:09.850
Each tool should do one thing exceptionally well.

04:10.530 --> 04:16.610
Overloaded endpoints that attempt to handle multiple behaviors through optional parameters create ambiguity.

04:17.130 --> 04:21.290
Ambiguity is dangerous when llms are generating inputs.

04:21.770 --> 04:24.730
Explicit parameters are equally important.

04:25.130 --> 04:29.090
Every input must have a clearly defined type and format.

04:29.370 --> 04:34.530
Optional heavy parameter lists increase the likelihood of misuse and invalid calls.

04:34.970 --> 04:40.890
The guiding rule here is simple but powerful one tool equals one responsibility.

04:41.370 --> 04:47.730
If your tool requires conditional logic to determine behavior, you probably need multiple tools.

04:48.370 --> 04:52.010
The engineering rule at the bottom of the slide is worth remembering.

04:52.370 --> 04:57.850
If humans misuse an API, llms will misuse it more and at scale.

04:58.290 --> 05:02.960
Clear interfaces are not just good design, they are safety mechanisms.

05:03.080 --> 05:08.920
This slide addresses one of the most important safety rules in tool using LM systems.

05:09.200 --> 05:11.920
Never trust LM generated inputs.

05:12.680 --> 05:18.760
As emphasized on page five of the deck, language models will occasionally produce malformed out of

05:18.800 --> 05:23.160
range or even malicious inputs regardless of prompt quality.

05:23.880 --> 05:28.200
Robust validation must occur on the server side, not in prompts.

05:28.640 --> 05:32.880
Required field checking ensures mandatory parameters are present.

05:33.480 --> 05:40.280
Data type validation confirms inputs match expected types such as strings, integers, or arrays.

05:40.600 --> 05:47.080
Value range enforcement prevents dangerous or nonsensical values from reaching your system's schema.

05:47.080 --> 05:52.160
Validation using JSON schema provides comprehensive structural guarantees.

05:52.760 --> 05:57.560
It ensures the entire payload conforms to expectations before execution.

05:58.000 --> 06:01.720
Finally, Server side verification is non-negotiable.

06:02.080 --> 06:04.880
The golden rule on this slide is essential.

06:05.440 --> 06:08.480
LMS just systems verify.

06:08.880 --> 06:12.200
Treat every Lem output as untrusted.

06:12.200 --> 06:19.840
User input validation is your final line of defense between intelligent automation and catastrophic

06:19.840 --> 06:20.520
failure.

06:21.080 --> 06:27.320
Error handling is often overlooked, but in tool using systems, it is part of the conversation.

06:27.920 --> 06:30.280
As explained on page six of the Dec.

06:30.560 --> 06:32.240
Errors are not failures.

06:32.560 --> 06:36.560
They are signals that help the Lem adjust its approach.

06:37.400 --> 06:41.880
Tools must return structured error objects with consistent formatting.

06:42.320 --> 06:49.120
Clear error codes enable programmatic handling, while descriptive messages explain what went wrong.

06:49.640 --> 06:56.360
Most importantly, error responses should provide actionable guidance so the model can retry correctly.

06:56.960 --> 06:59.510
Silent failures are especially dangerous.

06:59.910 --> 07:06.510
Returning empty results or vague messages leaves the model guessing and often leads to repeated mistakes.

07:06.950 --> 07:08.510
Specificity matters.

07:08.910 --> 07:12.030
Replacing invalid input with email format.

07:12.030 --> 07:16.270
Invalid missing at symbol dramatically improves recovery.

07:16.790 --> 07:19.110
Errors must also be retry safe.

07:19.350 --> 07:24.870
They should include enough context for the model to correct its request without human intervention.

07:25.230 --> 07:31.990
The key idea is powerful machine readable errors enable intelligent self-correction.

07:32.510 --> 07:37.910
Good error design turns failures into feedback loops instead of dead ends.

07:38.110 --> 07:41.950
This slide compares two fundamental tool design patterns.

07:42.430 --> 07:49.150
Stateless tools, as shown on page seven of the deck, do not store memory or maintain context between

07:49.150 --> 07:49.790
calls.

07:50.350 --> 07:53.310
Each invocation depends solely on its inputs.

07:53.870 --> 08:01.500
Stateless tools are easier to scale, simpler to debug, inherently safer and ideal for parallel execution.

08:01.900 --> 08:08.300
Common examples include database lookups, calculations, search queries, and data transformations.

08:08.940 --> 08:13.060
Stateful tools, by contrast, maintain context across calls.

08:13.420 --> 08:19.380
They are required for use cases like financial transactions, multi-step workflows, or session based

08:19.380 --> 08:19.940
systems.

08:20.500 --> 08:28.260
However, state introduces significant complexity concurrency issues, race conditions, hidden dependencies,

08:28.260 --> 08:29.540
and harder debugging.

08:30.020 --> 08:34.100
The best practice is clear prefer stateless tools by default.

08:34.580 --> 08:41.100
Introduce state only when absolutely necessary and only when you are prepared to manage the added complexity.

08:41.740 --> 08:45.340
State is powerful, but it is also a liability.

08:45.780 --> 08:51.420
This final slide summarizes the core lessons of designing tools for production.

08:51.460 --> 08:59.260
LMS has highlighted on page eight of the deck tool design requires a fundamental shift in mindset.

08:59.940 --> 09:02.660
These interfaces are not for human developers.

09:02.940 --> 09:10.260
They are for probabilistic reasoning systems that need deterministic execution tools to find safety

09:10.260 --> 09:11.020
boundaries.

09:11.620 --> 09:16.340
Every constraint you enforce at the tool level is a protection against misuse.

09:17.260 --> 09:24.900
APIs become LLM capabilities, meaning every exposed endpoint directly expands what the model can do

09:24.940 --> 09:28.580
in production validation is mandatory.

09:29.020 --> 09:31.900
Never trust LLM generated inputs.

09:32.780 --> 09:37.460
Errors enable recovery when they are structured, explicit and retry.

09:37.500 --> 09:43.660
Safe and stateless tools scale better, making them the default choice for most systems.

09:44.740 --> 09:46.620
The final insight is critical.

09:47.140 --> 09:51.900
Good tool design turns Llms into reliable system operators.

09:52.500 --> 09:58.860
It is how we transform probabilistic reasoning into deterministic production grade execution.