WEBVTT

00:05.000 --> 00:11.280
Large language models are extremely powerful, but they have a fundamental limitation that often gets

00:11.280 --> 00:12.000
overlooked.

00:12.560 --> 00:20.080
They generate responses solely from their training data and learn statistical patterns, as shown on

00:20.080 --> 00:21.360
page two of the deck.

00:21.840 --> 00:27.720
They do not query live systems, verify facts, or check whether information is current.

00:28.680 --> 00:33.480
This means that from the moment training ends, the model's knowledge begins to age.

00:33.960 --> 00:40.560
It may confidently reference outdated policies, discontinued products, or obsolete procedures.

00:41.120 --> 00:46.720
Worse, when the model encounters gaps in its knowledge, it does not say I don't know.

00:47.000 --> 00:51.000
Instead, it generates the most plausible sounding answer it can.

00:51.560 --> 00:59.000
This leads to three serious problems hallucinations, outdated information, and confidently wrong answers.

00:59.760 --> 01:05.230
Because the language is fluent, Users often trust responses that are factually incorrect.

01:05.910 --> 01:11.670
There is also no inherent source verification claims cannot be traced or audited.

01:11.830 --> 01:19.550
The key insight is critical for engineers to internalize Llms are exceptional language models, but

01:19.550 --> 01:23.710
they are not knowledge databases without external grounding.

01:23.870 --> 01:27.950
They are unreliable as information systems in production environments.

01:28.150 --> 01:28.910
Retrieval.

01:28.910 --> 01:35.510
Augmented generation, or Rag, is an architectural pattern designed to solve the core weaknesses of

01:35.510 --> 01:36.990
standalone llms.

01:37.750 --> 01:44.430
As illustrated on page three of the Dec rag fundamentally changes how models access and use information.

01:45.030 --> 01:51.630
Instead of relying entirely on parametric memory, what the model learned during training, Rag introduces

01:51.670 --> 01:53.910
a retrieval step before generation.

01:54.470 --> 02:00.750
First, the system retrieves relevant documents from a knowledge base, vector database or document

02:00.750 --> 02:01.270
store.

02:01.790 --> 02:06.350
Next, those documents are injected directly into the prompt as context.

02:06.910 --> 02:11.710
Finally, the LLM generates a response grounded in that retrieved information.

02:12.310 --> 02:18.630
This architecture allows the model to combine its natural language capabilities with external verifiable

02:18.630 --> 02:19.270
knowledge.

02:19.910 --> 02:26.070
The result is not just more accurate answers, but answers that can be traced back to specific sources.

02:26.790 --> 02:31.550
The most important takeaway is that Rag is not a prompt trick.

02:32.110 --> 02:34.470
It is a system level design pattern.

02:34.990 --> 02:43.030
By separating language capability from knowledge storage, Rag enables Llms to operate as reliable information

02:43.030 --> 02:46.950
synthesis engines rather than guess based text generators.

02:47.670 --> 02:54.910
Hallucinations occur because llms are trained to predict the next most likely token not to verify truth.

02:55.470 --> 03:00.510
When the model lacks information, it fills the gap with statistically plausible text.

03:00.550 --> 03:03.430
Page four of the deck explains this clearly.

03:03.670 --> 03:07.860
Hallucinations are an inevitable outcome of the training objective itself.

03:08.340 --> 03:13.260
Rag addresses this problem by changing the conditions under which generation happens.

03:13.740 --> 03:15.740
Instead of forcing the model to guess.

03:15.940 --> 03:20.420
Rag supplies relevant factual context before generation begins.

03:20.700 --> 03:27.060
This retrieved context fills knowledge gaps and narrows the model's output space by anchoring responses

03:27.060 --> 03:28.260
to real documents.

03:28.420 --> 03:32.340
Rag reduces the model's tendency to fabricate information.

03:32.860 --> 03:37.500
Claims can be traced back to source material, making verification possible.

03:38.180 --> 03:40.500
The model no longer operates in a vacuum.

03:40.820 --> 03:43.100
It is guided by concrete evidence.

03:43.180 --> 03:48.540
This represents a fundamental engineering shift from guessing to grounding.

03:49.020 --> 03:56.180
With Rag llms transition from creative language generators into controlled information synthesis systems.

03:56.740 --> 04:03.340
While hallucinations cannot be eliminated entirely, Rag dramatically reduces their frequency and impact,

04:03.700 --> 04:07.380
making production use far safer and more reliable.

04:07.580 --> 04:12.980
One of the most serious limitations of standalone LMS is static knowledge.

04:13.420 --> 04:19.020
As highlighted on page five of the Dec training, data is frozen at a specific point in time.

04:19.820 --> 04:27.260
A model trained in 2023 has no awareness of events, policies or product changes that occur afterward.

04:27.460 --> 04:33.300
Traditional solutions attempt to solve this with fine tuning or retraining, but those approaches are

04:33.300 --> 04:36.900
expensive, slow, and operationally complex.

04:37.620 --> 04:44.700
Rag provides a fundamentally better solution by decoupling knowledge from model parameters with Rag

04:44.900 --> 04:52.300
knowledge lives in external systems, databases, document stores, APIs that can be updated independently,

04:52.780 --> 04:58.820
new documents can be added, outdated ones removed, and corrections applied without touching the underlying

04:58.860 --> 04:59.420
model.

05:00.060 --> 05:06.100
This enables real time knowledge access and continuous improvement systems can integrate.

05:06.100 --> 05:13.170
Live data sources reflect policy updates immediately and remain accurate without costly retraining cycles.

05:13.330 --> 05:16.530
For enterprises, this capability is critical.

05:16.970 --> 05:19.250
Business knowledge changes constantly.

05:19.650 --> 05:26.890
Rag ensures AI systems stay current, relevant and aligned with reality rather than locked to a historical

05:26.890 --> 05:27.690
snapshot.

05:28.410 --> 05:34.690
Enterprise environments introduce challenges that make Rag not just useful, but essential.

05:35.410 --> 05:37.410
As shown on page six of the Dec.

05:37.770 --> 05:42.290
Enterprise data is private, proprietary, and domain specific.

05:42.850 --> 05:45.410
It cannot be included in public model training.

05:45.410 --> 05:52.930
Data organizations rely on specialized terminology, internal processes, and institutional knowledge

05:52.930 --> 05:54.330
that exist nowhere else.

05:54.970 --> 05:58.170
At the same time, this information changes frequently.

05:58.410 --> 06:03.010
Policies, procedures, pricing, and regulations may be updated daily.

06:03.730 --> 06:07.570
Standalone llms cannot access this internal knowledge.

06:08.200 --> 06:14.600
Even worse, relying on them without grounding risks, misinformation, compliance violations and loss

06:14.600 --> 06:15.320
of trust.

06:16.000 --> 06:22.480
Wragg bridges this gap by allowing AI systems to retrieve and reason over private enterprise data,

06:22.480 --> 06:25.160
while maintaining security and access control.

06:25.800 --> 06:31.400
Knowledge remains within organizational boundaries, and the model only sees what it is allowed to see

06:31.440 --> 06:34.120
at query time for enterprises.

06:34.400 --> 06:41.560
Wragg is the difference between experimental AI and deployable, trustworthy systems that meet real

06:41.600 --> 06:43.040
business requirements.

06:43.080 --> 06:50.400
Wragg enables a wide range of high impact enterprise AI use cases, as outlined on page seven of the

06:50.400 --> 06:50.840
deck.

06:51.520 --> 06:57.840
One of the most common is internal knowledge assistance, which help employees find information across

06:57.840 --> 07:02.280
wikis, documentation, and policy repositories instantly.

07:03.080 --> 07:06.080
Customer support bots are another major application.

07:06.480 --> 07:12.040
By retrieving from product documentation, troubleshooting guides and historical support tickets.

07:12.480 --> 07:18.520
Rag powered bots provide accurate, context aware responses instead of generic answers.

07:19.160 --> 07:23.480
Compliance and policy search is especially important in regulated industries.

07:23.960 --> 07:31.080
Rag allows employees to query regulatory requirements and internal policies while maintaining auditability

07:31.080 --> 07:32.200
and traceability.

07:32.960 --> 07:39.920
Technical documentation Q&amp;A helps engineers navigate complex APIs and architectures efficiently.

07:40.160 --> 07:46.880
Analytics and reporting assistants use Rag to retrieve relevant metrics and generate natural language

07:46.880 --> 07:50.160
explanations across all these use cases.

07:50.360 --> 07:56.360
One requirement is universal answers must be accurate, auditable, and traceable.

07:56.920 --> 08:01.280
Rag provides the architecture needed to meet these enterprise standards.

08:01.400 --> 08:06.840
Rag is often compared with fine tuning, but they solve very different problems.

08:07.280 --> 08:13.470
As summarized on page eight of the Dec, fine tuning embeds knowledge directly into model weights.

08:13.910 --> 08:19.350
This makes updates costly and risky, especially when knowledge changes frequently.

08:19.830 --> 08:20.510
Rag.

08:20.550 --> 08:23.350
By contrast, keeps knowledge external.

08:23.790 --> 08:29.590
It is easier to update, lower risk, and far more suitable for enterprise environments.

08:30.110 --> 08:34.550
Knowledge can be added or removed instantly without retraining the model.

08:34.990 --> 08:41.350
Fine tuning may still be useful for style adaptation or task specialization, but it should not be the

08:41.350 --> 08:44.270
primary method for injecting factual knowledge.

08:44.750 --> 08:48.110
In most cases, Rag should be implemented first.

08:48.630 --> 08:53.870
The rule of thumb for engineers is simple use Rag before fine tuning.

08:54.310 --> 09:00.990
Rag provides flexibility, traceability, and control, all of which are essential for production AI

09:01.030 --> 09:01.750
systems.

09:02.430 --> 09:10.110
This final slide reinforces a critical message Rag is not optional for production or enterprise AI systems.

09:10.550 --> 09:12.350
As emphasized throughout the deck.

09:12.590 --> 09:17.910
Accuracy, trust and compliance are non-negotiable in real world deployments.

09:18.470 --> 09:25.550
Rag dramatically reduces hallucinations by grounding responses in retrieved verifiable sources.

09:25.990 --> 09:31.470
It enables auditability and traceability, which are required in regulated industries.

09:31.990 --> 09:38.550
Most importantly, it gives organizations control over what knowledge AI systems access and how that

09:38.550 --> 09:39.630
knowledge is used.

09:40.230 --> 09:47.390
At its core, Rag combines three elements llms for language, external knowledge for facts, and system

09:47.390 --> 09:49.950
level control for safety and governance.

09:50.590 --> 09:56.670
This combination transforms Llms from impressive demos into reliable business systems.

09:57.150 --> 09:59.870
The final insight is simple but powerful.

10:00.230 --> 10:04.750
Rag is the bridge between AI potential and enterprise reality.

10:05.190 --> 10:08.870
Without it, llms remain risky and unreliable.

10:09.190 --> 10:14.270
With it, they become trustworthy, scalable, and ready for production use.