WEBVTT

00:01.120 --> 00:04.480
In today's video, let's clarify certain concepts.

00:04.680 --> 00:08.480
We will begin with embedding model followed by vector embedding.

00:08.760 --> 00:11.040
And then we will learn about vector search.

00:11.360 --> 00:16.040
These are the concepts that would help you understand the move framework in detail.

00:17.080 --> 00:18.720
Let's go through the background.

00:19.040 --> 00:23.640
In case of the text data military and army, they have similar meaning.

00:23.800 --> 00:27.400
Even though the words military and army are very different.

00:27.880 --> 00:35.520
For semantic search to work effectively, representation of military and army must significantly capture

00:35.520 --> 00:37.440
their semantic similarity.

00:38.240 --> 00:44.160
This is where vector representations are used and why the derivation is so important.

00:44.680 --> 00:51.840
If we want to compare two sentences, we don't want to compare the words they contain, but rather whether

00:51.840 --> 00:54.000
or not they mean the same thing.

00:54.720 --> 01:01.480
To preserve the data's meaning, we need to understand how to produce vectors where the relationship

01:01.480 --> 01:03.640
between the vectors makes sense.

01:04.240 --> 01:07.440
This is where the next obvious question comes up.

01:07.800 --> 01:09.600
Vector embedding.

01:10.160 --> 01:16.540
Embedding models are built by passing large amount of labeled data to a neural network.

01:16.860 --> 01:21.860
We then train this neural network to perform all sorts of different tasks.

01:22.380 --> 01:30.500
Embedding model processes input as natural language query and returns a multidimensional vector that

01:30.540 --> 01:33.180
represents the contextual information.

01:34.140 --> 01:40.980
Here on the left hand side, shop for table is a natural language query that is passed to an embedding

01:40.980 --> 01:48.340
model, and we get a multidimensional vector representation that represents the contextual information.

01:48.700 --> 01:52.620
In this case, it shop for data and refund fees.

01:52.940 --> 01:59.300
Now how we can use this multidimensional vector is where we cover it in the next topic, which is vector

01:59.300 --> 02:00.060
embedding.

02:00.300 --> 02:07.180
Once trained, an embedding model can transform raw data into vector embeddings that we saw in the.

02:07.220 --> 02:08.460
I'll give you the slide.

02:08.860 --> 02:14.540
A vector embedding is an array of numbers that is used to describe an object.

02:15.300 --> 02:19.220
For example, refund could have a vector here.

02:19.380 --> 02:22.220
More likely the array would be a.

02:22.260 --> 02:28.810
It's not longer than that, but this is a numeric representation of the natural language query.

02:29.290 --> 02:37.050
These vectors of numbers are then used by machine learning model to understand the meaning of the text.

02:37.450 --> 02:39.530
How does this all come together?

02:39.970 --> 02:47.090
On the left hand side, we have best places to visit as the natural language query, which is injected

02:47.090 --> 02:48.690
to the embedding model.

02:49.130 --> 02:55.690
Embedding model processes the and creates a multidimensional vectors here.

02:56.050 --> 03:03.090
What this means is all the similars matching vectors are all put together here, and a cluster and their

03:03.090 --> 03:05.130
distance would be smaller.

03:05.450 --> 03:10.810
Whereas here in this case a refund fees avoid debit card fees.

03:11.210 --> 03:18.170
They would have vector representations which are there close to each other, but they are far from the

03:18.170 --> 03:19.410
travel vectors.

03:19.650 --> 03:22.730
This is what vector embeddings means.

03:23.690 --> 03:28.210
Same applies for the dinner or cooking topic that we have here.

03:29.210 --> 03:36.750
Now the obvious next question is how does this work with the search Similarity measures are a function

03:36.750 --> 03:43.470
distance that takes two vectors as input and calculates the distance value between them.

03:43.790 --> 03:45.750
The distance can take shapes.

03:46.110 --> 03:50.030
It can be a geometric distance between two points.

03:50.430 --> 03:53.070
It could be an angle between the vectors.

03:53.470 --> 03:58.550
There are different algorithms that calculates the distance between the two vectors.

03:59.030 --> 04:07.950
Ultimately, we use the calculated distance to judge how close or far apart two vector embeddings are.

04:08.870 --> 04:15.030
In this situation, once the vectors were calculated and decided where they go on the two dimensional

04:15.030 --> 04:17.390
vector or multidimensional vector.

04:17.510 --> 04:25.510
When the user asks a query what are the best places to visit that would journey to Europe and those

04:25.510 --> 04:27.670
multidimensional vector here?

04:28.390 --> 04:33.430
When a user would ask how to make waffle, that would be closer to cook a dinner.

04:33.670 --> 04:34.910
How to make a cake.

04:35.310 --> 04:39.110
But it would be far from traveling or refunding fees.
