WEBVTT

00:01.280 --> 00:04.280
Hello everyone and welcome.

00:04.680 --> 00:07.560
In today's session we'll be learning about models.

00:07.720 --> 00:12.200
We'll go over the different kinds of models that are available as of today.

00:12.760 --> 00:17.440
We'll start with foundation models, then move on to language models.

00:17.840 --> 00:20.120
We'll cover large language models.

00:20.520 --> 00:24.360
And finally we'll wrap up with multimodal models.

00:25.320 --> 00:31.880
It's very important to understand models in general are a vast topic.

00:32.320 --> 00:36.200
It's hard to cover all the intrinsic details in a single course.

00:36.440 --> 00:44.080
However, in this section I've introduced all the essential jargon and terminology you'll need throughout

00:44.080 --> 00:44.920
the course.

00:46.160 --> 00:52.640
This will not only help you understand the content, it will also help you grasp how different models

00:52.640 --> 00:56.360
work and how they perform across categories.

00:56.720 --> 01:00.490
So having said that, let's begin.

01:01.330 --> 01:02.770
Foundation models.

01:03.490 --> 01:06.130
The first model we'll cover is the Foundation model.

01:06.450 --> 01:13.810
Foundation models, also known as FMS, are large deep learning neural networks trained on massive data

01:13.850 --> 01:14.530
sets.

01:15.130 --> 01:22.810
The term foundation model was coined by researchers to describe ML models that are trained on broad,

01:22.970 --> 01:31.610
generalized and unlabeled data and capable of performing a wide variety of general tasks like understanding

01:31.610 --> 01:36.890
language, generating text, and even generating images.

01:37.890 --> 01:44.370
The size and purpose of an FM is what sets it apart from traditional machine learning models.

01:44.490 --> 01:52.370
You can use a foundation model as a base model for developing more specialized downstream applications.

01:53.450 --> 01:55.820
How do foundation models work?

01:56.460 --> 01:59.660
Foundation models are a form of generative AI.

02:00.260 --> 02:07.540
They generate output based on one or more inputs, usually in the form of human language instructions.

02:07.980 --> 02:15.940
These models learn patterns and relationships and use that learning to predict the next item in a sequence.

02:16.460 --> 02:22.020
In the case of text, they predict the next word in a sentence based on context.

02:22.900 --> 02:29.180
With image generation, they analyze image features and create clearer, sharper versions.

02:29.580 --> 02:36.140
It's crucial to remember FM's use learned patterns to predict the next logical item in a sequence.

02:36.620 --> 02:38.980
Examples of foundation models.

02:39.900 --> 02:45.100
Some examples include Bert bidirectional encoder representations.

02:45.100 --> 02:48.260
Stars Transformers released in 2018.

02:48.740 --> 02:54.510
It analyzes the full context of a sentence bidirectionally and then makes predictions.

02:55.070 --> 02:55.470
G.

02:56.830 --> 03:02.790
Generative pre-trained transformer also released in 2018 by OpenAI.

03:03.550 --> 03:10.350
It uses a 12 layer transformer decoder with a self-attention mechanism, and was trained on a dataset

03:10.390 --> 03:13.070
of over 11,000 free novels.

03:13.670 --> 03:19.590
Amazon Titan a powerful pre-trained foundation model designed for general purpose use.

03:19.950 --> 03:26.830
There are many others too, from organizations like cohere, Anthropic, and Meta's Llama family.

03:27.790 --> 03:29.030
Language models.

03:29.150 --> 03:32.590
Now let's understand language models.

03:33.390 --> 03:39.870
A language model is a type of machine learning model designed specifically to represent language.

03:40.430 --> 03:47.680
It forms the basis for many language tasks like Q and a summarization or semantic search.

03:48.800 --> 03:50.640
How are models trained?

03:51.120 --> 03:57.760
The training process involves neural networks like Bert processing millions of data points.

03:58.120 --> 04:00.800
This process is what we call training.

04:02.200 --> 04:05.480
Once you train a model, the next phase is fine tuning.

04:05.960 --> 04:12.000
There are many tasks that benefit from language model like sentiment analysis, question answering,

04:12.000 --> 04:12.920
and others.

04:13.440 --> 04:18.080
Adapting a general purpose model to such task is known as fine tuning.

04:18.480 --> 04:22.560
Now let's understand diagrammatically how this looks like.

04:23.320 --> 04:24.480
General purpose.

04:25.320 --> 04:32.200
Purpose models like Bert or its bigger sister, like Roberta, require a huge amount of data to learn

04:32.200 --> 04:34.080
a language's regularities.

04:34.560 --> 04:42.320
NLP practitioners often use Wikipedia and other freely available collections of textual data to train

04:42.320 --> 04:42.760
them.

04:43.050 --> 04:45.730
Here are the general purpose model available.

04:45.730 --> 04:48.570
As of now we have Bert code.

04:48.610 --> 04:53.850
Roberta, Roberta and others that are listed here from the Bert family.

04:54.290 --> 05:01.410
So once you have a general purpose model, you would want to fine tune a language model to make it like

05:01.410 --> 05:03.570
Bert to a specific domain.

05:03.810 --> 05:06.610
You'd fine tune it using domain specific data.

05:07.530 --> 05:16.290
For example, Bert, plus scientific papers equals sy Bert Bert plus financial texts equals fin.

05:16.290 --> 05:19.210
Bert plus legal documents equals legal.

05:19.250 --> 05:19.810
Bert.

05:20.370 --> 05:23.370
That's how fine tuning helps specialize a model.

05:24.810 --> 05:32.690
Large language models llms with advancements and language models, researchers built even larger ones,

05:33.010 --> 05:37.210
requiring massive computing power and huge training data sets.

05:37.610 --> 05:43.100
That's how large language models or less were born.

05:44.860 --> 05:47.300
Model size comparison.

05:47.300 --> 05:58.500
Let's compare Bert Base Bert Large, bigger GPT three with 175 billion parameters, GPT four even larger.

05:58.500 --> 06:00.860
That's why they're called large language models.

06:01.740 --> 06:03.860
Training Llms the phases.

06:04.620 --> 06:06.180
There are three key phases.

06:06.620 --> 06:11.580
Pre-training the model learns by predicting the next word in a sequence.

06:12.140 --> 06:14.940
Supervised fine tuning.

06:15.220 --> 06:21.420
Researchers teach the model how to respond to prompts by using high quality Q and A datasets.

06:22.340 --> 06:26.700
Reinforcement learning from human feedback, also known as RL.

06:27.900 --> 06:31.940
This involves human rankings to help guide the model toward better answers.

06:32.300 --> 06:34.820
You can imagine the model evolving like this.

06:35.260 --> 06:37.950
At first it's like a wild dragon.

06:38.310 --> 06:45.750
Supervised fine tuning tames it r f allows it to collaborate with humans effectively.

06:47.310 --> 06:48.870
Multimodal models.

06:48.910 --> 06:52.030
Now let's talk about multimodal models.

06:52.310 --> 06:58.310
These models can understand and generate content across multiple modalities, not just text.

06:58.630 --> 07:06.270
So instead of only processing and generating text, they can now process text, images, audio and video

07:06.630 --> 07:10.550
and output text, images, audio and video.

07:11.390 --> 07:17.110
This enables much richer, more context aware and versatile AI experiences.

07:17.550 --> 07:23.630
I hope you now have a solid understanding of the different kinds of models and how they evolved over

07:23.630 --> 07:24.270
time.

07:25.270 --> 07:28.190
I'll see you in the next video.
