WEBVTT

1
00:00:00.390 --> 00:00:03.550
Welcome to the RAG section of this course.

2
00:00:04.510 --> 00:00:07.130
So, what is RAG and why do we

3
00:00:07.130 --> 00:00:07.770
need it?

4
00:00:08.470 --> 00:00:12.630
RAG is kind of difficult to understand, so

5
00:00:12.630 --> 00:00:16.010
I'm going to spend some quieter time here

6
00:00:16.010 --> 00:00:18.230
in PowerPoint to explain to you what it

7
00:00:18.230 --> 00:00:21.290
is, before we go into the components and

8
00:00:21.290 --> 00:00:22.270
actually some code.

9
00:00:23.630 --> 00:00:24.950
So, what's RAG?

10
00:00:25.310 --> 00:00:31.369
Well, RAG stands for Retrieval Augmented Generation, but

11
00:00:31.369 --> 00:00:34.330
that doesn't really make us any wiser, so

12
00:00:34.330 --> 00:00:35.550
let's see an example.

13
00:00:37.270 --> 00:00:40.990
So, if we ask an LLM what is

14
00:00:40.990 --> 00:00:44.750
the capital of France, it can quite easily

15
00:00:44.750 --> 00:00:48.430
answer back that the capital of France is

16
00:00:48.430 --> 00:00:48.850
Paris.

17
00:00:49.130 --> 00:00:52.390
So, this is an easy question, and we

18
00:00:52.390 --> 00:00:55.710
can give it very, very advanced questions like

19
00:00:55.710 --> 00:00:57.750
what is the Einstein-DeHaas effect?

20
00:00:58.850 --> 00:01:01.930
I just randomly found that, and it's apparently

21
00:01:01.930 --> 00:01:04.209
something about spinning objects.

22
00:01:05.110 --> 00:01:08.610
Not important, but this feels like, oh, I'm

23
00:01:08.610 --> 00:01:11.250
really talking to an intelligent being here.

24
00:01:12.670 --> 00:01:14.750
But then I can ask it, what is

25
00:01:14.750 --> 00:01:16.730
the guest Wi-Fi at the office?

26
00:01:17.290 --> 00:01:18.390
And it doesn't know.

27
00:01:19.550 --> 00:01:24.510
And of course it doesn't know, because in

28
00:01:24.510 --> 00:01:28.670
the world of knowledge there's different sections of

29
00:01:28.670 --> 00:01:28.950
that.

30
00:01:30.550 --> 00:01:34.370
There's the Internet, of course, and LLMs know

31
00:01:34.370 --> 00:01:34.910
the Internet.

32
00:01:35.170 --> 00:01:39.290
It's distilled, meaning that all side malware, the

33
00:01:39.290 --> 00:01:41.290
dark web and such things has been taken

34
00:01:41.290 --> 00:01:44.850
away, but else it knows kind of everything

35
00:01:44.850 --> 00:01:46.430
on the Internet.

36
00:01:47.270 --> 00:01:50.170
All depending on if you use a big

37
00:01:50.170 --> 00:01:53.110
model, it knows the entire thing in details.

38
00:01:53.330 --> 00:01:55.730
If you use smaller models, it's only the

39
00:01:55.730 --> 00:01:56.250
highlights.

40
00:01:57.730 --> 00:02:01.250
But there's a bunch of other knowledge in

41
00:02:01.250 --> 00:02:01.710
the world.

42
00:02:02.030 --> 00:02:05.350
The Internet is big, but all your personal

43
00:02:05.350 --> 00:02:08.830
data, your emails, your bank data, your documents

44
00:02:08.830 --> 00:02:11.090
and so on, despite them might being in

45
00:02:11.090 --> 00:02:15.690
a OneDrive or a Google Cloud or in

46
00:02:15.690 --> 00:02:19.170
Gmail or something, it's still not data that

47
00:02:19.170 --> 00:02:20.350
are publicly available.

48
00:02:21.130 --> 00:02:23.150
So an AI, of course, don't know.

49
00:02:23.850 --> 00:02:26.750
And you probably can understand that you didn't

50
00:02:26.750 --> 00:02:29.370
want to know that already, because that could

51
00:02:29.370 --> 00:02:32.290
be medical information, that could be sensitive information

52
00:02:32.290 --> 00:02:32.670
in there.

53
00:02:33.450 --> 00:02:35.370
So, of course, an LLM out of the

54
00:02:35.370 --> 00:02:36.330
box don't know.

55
00:02:38.110 --> 00:02:40.250
In a similar manner, it doesn't know your

56
00:02:40.250 --> 00:02:43.330
company data, your documents, your contract with the

57
00:02:43.330 --> 00:02:44.510
clients and so on.

58
00:02:44.750 --> 00:02:45.970
Of course it shouldn't know.

59
00:02:48.610 --> 00:02:51.430
It also, of course, don't know other people's

60
00:02:51.430 --> 00:02:51.590
data.

61
00:02:51.710 --> 00:02:54.530
You even don't know and can't get access

62
00:02:54.530 --> 00:02:58.150
to other companies, other people's private data.

63
00:02:58.570 --> 00:03:00.070
So how should an LLM know?

64
00:03:02.420 --> 00:03:04.140
And then, of course, there's a bunch of

65
00:03:04.140 --> 00:03:04.960
offline data.

66
00:03:05.140 --> 00:03:08.600
Despite me wanting to know something, it might

67
00:03:08.600 --> 00:03:10.300
be that it's only in a book in

68
00:03:10.300 --> 00:03:13.740
a different country that have that specific knowledge

69
00:03:13.740 --> 00:03:16.820
somewhere, because it hasn't been scanned yet, digitised

70
00:03:16.820 --> 00:03:18.060
in any way.

71
00:03:18.920 --> 00:03:21.140
So there's a lot of data in the

72
00:03:21.140 --> 00:03:24.800
world, and an LLM has lots of it,

73
00:03:25.020 --> 00:03:26.840
but doesn't have all of it, of course.

74
00:03:29.880 --> 00:03:33.240
So RAG is the process of giving some

75
00:03:33.240 --> 00:03:35.880
of this extra data to the LLM so

76
00:03:35.880 --> 00:03:39.000
that it doesn't know on the fly.

77
00:03:41.420 --> 00:03:45.360
So let's say that this box represented everything

78
00:03:45.360 --> 00:03:46.720
in the world I knew.

79
00:03:47.460 --> 00:03:48.980
I knew that my middle name was Rasmus,

80
00:03:49.060 --> 00:03:51.260
I knew my favourite colour was blue, I

81
00:03:51.260 --> 00:03:52.700
knew my parents had a dog called Mr.

82
00:03:52.840 --> 00:03:56.000
Buttons, I knew the password to the guest

83
00:03:56.000 --> 00:03:58.500
Wi-Fi at the office is 42, I

84
00:03:58.500 --> 00:04:00.800
know I live in Denmark, I know John

85
00:04:00.800 --> 00:04:02.600
is in charge of support, and I know

86
00:04:02.600 --> 00:04:04.700
my birthday is February 25th.

87
00:04:05.500 --> 00:04:09.040
If I knew all that, I could, in

88
00:04:09.040 --> 00:04:12.240
a prompt, or in what we know now

89
00:04:12.240 --> 00:04:18.560
as instructions, give everything I know, and then

90
00:04:18.560 --> 00:04:21.220
ask what is the guest Wi-Fi password

91
00:04:21.220 --> 00:04:22.540
at the office.

92
00:04:23.620 --> 00:04:25.540
And then the AI would know, because it

93
00:04:25.540 --> 00:04:28.000
could shift to all of it and find

94
00:04:28.000 --> 00:04:31.520
that this one specific one is the part

95
00:04:31.520 --> 00:04:32.100
we need.

96
00:04:35.640 --> 00:04:39.440
But this box, of course, is not everything

97
00:04:39.440 --> 00:04:39.860
I know.

98
00:04:41.120 --> 00:04:44.060
Everything I know is a lot of things,

99
00:04:45.080 --> 00:04:46.620
if we begin to count it up.

100
00:04:47.140 --> 00:04:49.580
So how could I give it everything I

101
00:04:49.580 --> 00:04:49.960
know?

102
00:04:51.280 --> 00:04:53.800
Well, some of it would be fairly easy

103
00:04:53.800 --> 00:04:54.880
to give.

104
00:04:55.060 --> 00:04:56.820
It's just some raw text I have in

105
00:04:56.820 --> 00:04:59.580
various online documents and stuff.

106
00:05:00.340 --> 00:05:04.020
But there's also a lot of messy stuff

107
00:05:04.020 --> 00:05:05.380
in binary formats.

108
00:05:05.600 --> 00:05:08.540
For example, if I have a thousand images

109
00:05:08.540 --> 00:05:12.400
and they show different things on the places

110
00:05:12.400 --> 00:05:15.240
I have visited, I wouldn't be able to

111
00:05:15.240 --> 00:05:21.620
ask my AI which countries have I visited,

112
00:05:21.900 --> 00:05:23.360
if it's only in images.

113
00:05:24.300 --> 00:05:27.660
So I would need to give something to

114
00:05:27.660 --> 00:05:30.440
an AI and actually extract all that data

115
00:05:30.440 --> 00:05:31.320
from the images.

116
00:05:31.860 --> 00:05:33.680
So some of them are difficult to get

117
00:05:33.680 --> 00:05:33.940
to.

118
00:05:34.820 --> 00:05:37.140
And there's also things that I only have

119
00:05:37.140 --> 00:05:38.840
in my mind that are never written down.

120
00:05:39.020 --> 00:05:41.320
I don't think I have any single document

121
00:05:41.320 --> 00:05:44.100
here that shows what my favourite colour is.

122
00:05:44.200 --> 00:05:45.680
That's only in my mind.

123
00:05:46.100 --> 00:05:49.280
And that is not something the AI would

124
00:05:49.280 --> 00:05:50.220
be able to be given.

125
00:05:51.200 --> 00:05:52.200
And there's a bunch more.

126
00:05:53.080 --> 00:05:56.360
And if we begin to tally up everything

127
00:05:56.360 --> 00:05:59.420
I know in terms of personal data, my

128
00:05:59.420 --> 00:06:03.400
company data, and so on, it would be

129
00:06:03.400 --> 00:06:08.040
so much data that an AI would be

130
00:06:08.040 --> 00:06:09.580
information overloaded.

131
00:06:10.880 --> 00:06:13.800
Because what we're talking about here, if I

132
00:06:13.800 --> 00:06:15.660
gave it up front, we saw it before,

133
00:06:15.800 --> 00:06:17.780
this was fairly easy to give up front.

134
00:06:17.880 --> 00:06:21.180
That would be like a hundred tokens or

135
00:06:21.180 --> 00:06:21.920
something like that.

136
00:06:22.820 --> 00:06:25.920
If we go here, there is actually a

137
00:06:25.920 --> 00:06:27.440
max to how much we can give.

138
00:06:28.220 --> 00:06:31.160
Just like if you place some text into

139
00:06:31.160 --> 00:06:33.240
a field, there's often a max limit on

140
00:06:33.240 --> 00:06:34.960
how big that field can be.

141
00:06:36.120 --> 00:06:38.380
When it comes to the question we can

142
00:06:38.380 --> 00:06:43.720
give, the context window, as it's called, so

143
00:06:43.720 --> 00:06:45.620
this is the words that are being used.

144
00:06:45.780 --> 00:06:47.440
So if you hear the word context window,

145
00:06:47.620 --> 00:06:49.800
it's actually how much can you put into

146
00:06:49.800 --> 00:06:50.300
a prompt.

147
00:06:52.400 --> 00:06:56.320
And it varies from 128,000 for some

148
00:06:56.320 --> 00:06:58.700
of the big models to the really, really

149
00:06:58.700 --> 00:07:01.460
capable models is up to a million tokens.

150
00:07:02.940 --> 00:07:05.660
And turn that into pages of text, it's

151
00:07:05.660 --> 00:07:08.680
roughly between 500 and 2,000 pages of

152
00:07:08.680 --> 00:07:09.020
text.

153
00:07:10.480 --> 00:07:13.240
So Google is the one that are highest

154
00:07:13.240 --> 00:07:16.660
up right now, while ChatGPT is a bit

155
00:07:16.660 --> 00:07:17.800
further down, for example.

156
00:07:19.740 --> 00:07:23.240
But giving all this data comes as a

157
00:07:23.240 --> 00:07:23.620
cost.

158
00:07:24.140 --> 00:07:26.560
It comes as three different costs.

159
00:07:27.080 --> 00:07:29.840
First off, giving a million tokens, we have

160
00:07:29.840 --> 00:07:30.640
seen the prices.

161
00:07:31.760 --> 00:07:33.100
So if we need a big model and

162
00:07:33.100 --> 00:07:35.620
that, it costs, just to make that one

163
00:07:35.620 --> 00:07:38.640
prompt, one or two US dollars.

164
00:07:40.360 --> 00:07:43.560
It also comes at the cost of speed.

165
00:07:43.800 --> 00:07:46.240
So if we give them all these million

166
00:07:46.240 --> 00:07:48.880
tokens and send this prompt, it will take

167
00:07:48.880 --> 00:07:51.020
quite a while to get back to us.

168
00:07:52.780 --> 00:07:58.780
And if we give it everything among all

169
00:07:58.780 --> 00:08:00.980
these, let's say there was 10,000, a

170
00:08:00.980 --> 00:08:06.020
million different entries in this context, one of

171
00:08:06.020 --> 00:08:07.460
them would say what the guest Wi-Fi

172
00:08:07.460 --> 00:08:08.680
password is.

173
00:08:08.980 --> 00:08:11.320
That's suddenly a needle in a haystack.

174
00:08:11.820 --> 00:08:13.840
So it might even not find it despite

175
00:08:13.840 --> 00:08:16.000
it being given because it has so much

176
00:08:16.000 --> 00:08:16.660
other data.

177
00:08:18.020 --> 00:08:23.740
But again, even if I filled up all

178
00:08:23.740 --> 00:08:26.360
this, that would not be nearly what I

179
00:08:26.360 --> 00:08:27.580
know in my mind.

180
00:08:29.100 --> 00:08:31.960
And some of it will still be difficult

181
00:08:31.960 --> 00:08:34.480
to get because it's not even possible.

182
00:08:34.700 --> 00:08:36.020
It can't read faults.

183
00:08:38.480 --> 00:08:41.159
So we need some kind of cert index

184
00:08:41.159 --> 00:08:43.059
for all this, everything I know.

185
00:08:43.200 --> 00:08:45.220
And of course, I need to curate what

186
00:08:45.220 --> 00:08:47.660
do I want to put into an AI.

187
00:08:48.040 --> 00:08:50.020
I might not want to put my medical

188
00:08:50.020 --> 00:08:52.120
information, but I might want to put my

189
00:08:52.120 --> 00:08:54.580
financial information or the reverse.

190
00:08:55.500 --> 00:08:57.220
But we need some kind of cert index

191
00:08:57.220 --> 00:08:59.220
for all this, everything I know.

192
00:09:00.300 --> 00:09:04.020
And I need to put it into something

193
00:09:04.020 --> 00:09:05.280
a little like Google.

194
00:09:06.140 --> 00:09:07.840
So when you go to Google and you

195
00:09:07.840 --> 00:09:11.100
search, it kind of knows the entire Internet

196
00:09:11.100 --> 00:09:13.020
and can find the right pages, but it's

197
00:09:13.020 --> 00:09:14.380
just the index of it.

198
00:09:14.480 --> 00:09:16.880
It's not like it has all the information

199
00:09:16.880 --> 00:09:20.360
about every site, but it has enough information

200
00:09:20.360 --> 00:09:23.500
about the site in order to give it

201
00:09:23.500 --> 00:09:24.900
back to you as a search result.

202
00:09:26.480 --> 00:09:30.420
So we need to take our data and

203
00:09:30.420 --> 00:09:31.920
put it into a cert index.

204
00:09:33.220 --> 00:09:35.120
And then we need to ask the question

205
00:09:35.120 --> 00:09:38.720
again, but instead of giving everything I know,

206
00:09:39.140 --> 00:09:41.900
meaning all this data, I need to make

207
00:09:41.900 --> 00:09:44.180
a search result on the fly.

208
00:09:45.880 --> 00:09:48.380
So I need to search for Wi-Fi

209
00:09:48.380 --> 00:09:50.740
password office, for example, or just Wi-Fi

210
00:09:50.740 --> 00:09:53.380
password or however I find it.

211
00:09:53.460 --> 00:09:55.480
It's a little like going to Google and

212
00:09:55.480 --> 00:09:57.220
putting in the right words to give the

213
00:09:57.220 --> 00:09:58.140
best search result.

214
00:09:59.420 --> 00:10:02.180
And if I do that, that search result

215
00:10:02.180 --> 00:10:05.920
will give back let's say among all this

216
00:10:05.920 --> 00:10:09.420
over here, three things back, that the Wi

217
00:10:09.420 --> 00:10:12.120
-Fi guest password is 42, that my home

218
00:10:12.120 --> 00:10:16.040
password is Maverick123, and my last Wi-Fi

219
00:10:16.040 --> 00:10:19.380
bill, whatever that was in terms of money.

220
00:10:20.680 --> 00:10:23.540
Because just searching for Wi-Fi password office

221
00:10:23.540 --> 00:10:27.900
could give the right things back, but also

222
00:10:27.900 --> 00:10:28.820
other things back.

223
00:10:29.240 --> 00:10:32.420
It depends on how many search results I

224
00:10:32.420 --> 00:10:33.220
want to have back.

225
00:10:33.740 --> 00:10:36.500
In this case, if this was the most

226
00:10:36.500 --> 00:10:41.280
relevant one, and I said only one, it

227
00:10:41.280 --> 00:10:43.380
would have gone okay, but if I asked

228
00:10:43.380 --> 00:10:46.100
for the home Wi-Fi and said only

229
00:10:46.100 --> 00:10:48.020
one, I would only have gotten this, and

230
00:10:48.020 --> 00:10:50.160
it wouldn't have gotten number two and number

231
00:10:50.160 --> 00:10:50.740
three here.

232
00:10:51.640 --> 00:10:53.320
So in this case, I'm asking for three

233
00:10:53.320 --> 00:10:55.820
things, and these are the three most relevant

234
00:10:55.820 --> 00:10:57.820
just like when you make a Google search

235
00:10:57.820 --> 00:10:58.900
for the most relevant.

236
00:11:01.500 --> 00:11:03.800
And then the AI would be able to

237
00:11:03.800 --> 00:11:04.740
answer my question.

238
00:11:04.980 --> 00:11:08.640
It would feel intelligent again, because despite it

239
00:11:08.640 --> 00:11:12.180
not having everything, it on the fly got

240
00:11:12.180 --> 00:11:16.180
the most relevant things, and among that, we

241
00:11:16.180 --> 00:11:18.800
were lucky enough that the answer was actually

242
00:11:18.800 --> 00:11:22.140
there and was able to ask, give it

243
00:11:22.140 --> 00:11:22.800
back to me.

244
00:11:24.860 --> 00:11:29.420
This is actually Rack, and what we are

245
00:11:29.420 --> 00:11:32.560
doing here is what is called embedding and

246
00:11:32.560 --> 00:11:36.500
ingestion of data into a vector store, which

247
00:11:36.500 --> 00:11:40.080
is where we save Rack data, and whenever

248
00:11:40.080 --> 00:11:42.800
we make this request to the search index

249
00:11:42.800 --> 00:11:45.820
or the vector store, that's a Rack search

250
00:11:45.820 --> 00:11:52.980
where we at the top of our search

251
00:11:52.980 --> 00:11:57.460
query, augment the data with Rack data so

252
00:11:57.460 --> 00:11:58.980
the AI can answer.

253
00:12:00.680 --> 00:12:03.660
So this is actually Rack, what we have

254
00:12:03.660 --> 00:12:04.260
just seen.

255
00:12:04.440 --> 00:12:07.960
Now let's go into what the different components

256
00:12:07.960 --> 00:12:11.160
are so we can go in and see

257
00:12:11.160 --> 00:12:11.820
some code soon.
