WEBVTT

1
00:00:00.710 --> 00:00:04.450
We ended the last lecture on this picture

2
00:00:04.450 --> 00:00:08.850
of what RAG actually is, and it's about

3
00:00:08.850 --> 00:00:11.350
getting all our data into some kind of

4
00:00:11.350 --> 00:00:13.030
search index so we can search.

5
00:00:14.210 --> 00:00:17.370
Let's turn this into a more abstract picture

6
00:00:17.370 --> 00:00:20.830
and show the flow of what's going on

7
00:00:20.830 --> 00:00:22.090
with the various components.

8
00:00:24.830 --> 00:00:27.510
So the moving parts are that we of

9
00:00:27.510 --> 00:00:28.770
course have some source data.

10
00:00:29.450 --> 00:00:31.590
We know of course that not all the

11
00:00:31.590 --> 00:00:35.210
data can be put in and also that

12
00:00:35.210 --> 00:00:37.270
not all the data we want to put

13
00:00:37.270 --> 00:00:37.450
in.

14
00:00:37.530 --> 00:00:40.950
So we need to take this data and

15
00:00:40.950 --> 00:00:43.690
we need to clean it and select it.

16
00:00:44.230 --> 00:00:47.970
So that could be taking a PDF, getting

17
00:00:47.970 --> 00:00:50.630
the raw data out of it, getting the

18
00:00:50.630 --> 00:00:54.150
data out of images, or just taking the

19
00:00:54.150 --> 00:00:57.470
data we have and preparing it so it

20
00:00:57.470 --> 00:01:00.410
is in the right format of the vector

21
00:01:00.410 --> 00:01:00.830
search.

22
00:01:03.290 --> 00:01:05.850
After that, we need to embed the data,

23
00:01:06.970 --> 00:01:09.890
and the embedding of the data is actually

24
00:01:09.890 --> 00:01:12.690
to call a limb with the data in

25
00:01:12.690 --> 00:01:15.970
the structure we wanted and returning it back

26
00:01:15.970 --> 00:01:17.350
as what is called a vector.

27
00:01:17.870 --> 00:01:24.930
So vector is 1536 dimensions of that piece

28
00:01:24.930 --> 00:01:28.810
of data, so it can be searched across

29
00:01:28.810 --> 00:01:29.710
various things.

30
00:01:29.850 --> 00:01:33.330
So if we search for cat, the word

31
00:01:33.330 --> 00:01:36.690
kitten can also be a similar type of

32
00:01:36.690 --> 00:01:39.550
data and so on.

33
00:01:39.910 --> 00:01:42.290
We'll go more into embeddings in a separate

34
00:01:42.290 --> 00:01:42.670
lecture.

35
00:01:44.970 --> 00:01:47.810
And then we ingest it into a vector

36
00:01:47.810 --> 00:01:48.170
store.

37
00:01:50.870 --> 00:01:53.970
There's various different vector stores to choose from,

38
00:01:54.690 --> 00:01:58.330
like for example, Azure AI Search, SQL Server

39
00:01:58.330 --> 00:02:01.250
that can be it, SQLite can be it,

40
00:02:01.370 --> 00:02:04.150
you can even do in-memory, Cosmos DB,

41
00:02:05.110 --> 00:02:07.449
and there's some open source versions as well

42
00:02:07.449 --> 00:02:11.710
like Quadrant and Veviate, and other players like

43
00:02:11.710 --> 00:02:14.170
Progress and Redis is in the game as

44
00:02:14.170 --> 00:02:14.370
well.

45
00:02:15.210 --> 00:02:18.930
So we're going to use SQL Server and

46
00:02:18.930 --> 00:02:22.470
SQLite and in-memory primarily, but all of

47
00:02:22.470 --> 00:02:24.910
them work the same because they're working with

48
00:02:24.910 --> 00:02:33.100
a NuGet package called MicrosoftExtensions.VectorData. Then our

49
00:02:33.100 --> 00:02:37.200
user comes into play and asks that question.

50
00:02:38.240 --> 00:02:41.780
And from that side, we again embed, and

51
00:02:41.780 --> 00:02:43.980
what we are embedding is actually the search

52
00:02:43.980 --> 00:02:47.280
query, because when we need to search for

53
00:02:47.280 --> 00:02:51.580
something, we need to similarity search against these

54
00:02:51.580 --> 00:02:54.000
vectors we put in on which of them

55
00:02:54.000 --> 00:02:56.400
are most like our question.

56
00:02:57.920 --> 00:03:00.260
So we get our search query, turn it

57
00:03:00.260 --> 00:03:02.980
into a vector, and then we do a

58
00:03:02.980 --> 00:03:04.500
vector similarity search.

59
00:03:05.680 --> 00:03:08.640
When we have done that search, we get

60
00:03:08.640 --> 00:03:11.860
some results back, we choose how many, and

61
00:03:11.860 --> 00:03:14.840
for each search result, we get some kind

62
00:03:14.840 --> 00:03:18.400
of score, often it's one being it's an

63
00:03:18.400 --> 00:03:22.260
exact match, and zero being nothing in the

64
00:03:22.260 --> 00:03:22.620
manner.

65
00:03:23.040 --> 00:03:24.580
So like, for example, if we get a

66
00:03:24.580 --> 00:03:28.120
score of 0.7, that means it's 0

67
00:03:28.120 --> 00:03:32.420
% the same, our search query compared to

68
00:03:32.420 --> 00:03:33.540
what it actually found.

69
00:03:34.880 --> 00:03:38.700
And once that happens, our AI, of course,

70
00:03:38.780 --> 00:03:42.100
can use those search results, get them into

71
00:03:42.100 --> 00:03:46.440
the context together with the question we have,

72
00:03:46.700 --> 00:03:47.980
and answer the question.

73
00:03:49.440 --> 00:03:51.540
So this is all the parts that are,

74
00:03:52.460 --> 00:03:55.200
and we will have separate lectures on embeddings

75
00:03:55.200 --> 00:03:58.160
and ingestion and so on, and putting it

76
00:03:58.160 --> 00:04:01.940
all together in the next few lectures.
