WEBVTT

1
00:00:00.580 --> 00:00:04.120
So far, we have, whenever we talk to

2
00:00:04.120 --> 00:00:06.820
the LLM, just had some hard-coded strings,

3
00:00:07.020 --> 00:00:08.860
like what is the capital in France, and

4
00:00:08.860 --> 00:00:09.240
so on.

5
00:00:09.620 --> 00:00:12.520
So, let's change that now by building a

6
00:00:12.520 --> 00:00:13.060
chat loop.

7
00:00:14.000 --> 00:00:17.020
So, in this sample, the chat loop sample,

8
00:00:17.400 --> 00:00:19.860
we are just making our connection as we've

9
00:00:19.860 --> 00:00:20.420
seen before.

10
00:00:21.140 --> 00:00:23.040
We are making our agent as we've seen

11
00:00:23.040 --> 00:00:23.420
before.

12
00:00:23.420 --> 00:00:25.900
But now we are introducing a while true,

13
00:00:27.040 --> 00:00:30.700
where we just ask different questions.

14
00:00:30.960 --> 00:00:34.880
And we make a little command prompt and

15
00:00:34.880 --> 00:00:37.040
ask for input from the user.

16
00:00:38.220 --> 00:00:41.640
We then use streaming and take that input

17
00:00:41.640 --> 00:00:43.220
instead of a hard-coded value.

18
00:00:43.220 --> 00:00:47.220
We're using the option to get our updates

19
00:00:47.220 --> 00:00:48.200
back from streaming.

20
00:00:48.620 --> 00:00:53.260
So, we can write out the tokens in

21
00:00:53.260 --> 00:00:54.300
and tokens out.

22
00:00:54.780 --> 00:00:58.040
We could also use the run async instead

23
00:00:58.040 --> 00:01:00.280
and not needing to have the streaming part.

24
00:01:02.500 --> 00:01:04.860
But let's see what happens when we run

25
00:01:04.860 --> 00:01:05.180
this.

26
00:01:06.900 --> 00:01:11.360
If we say hello, it will come back

27
00:01:11.360 --> 00:01:14.560
on helping one to assist us.

28
00:01:14.820 --> 00:01:17.800
And it takes eight input tokens and ten

29
00:01:17.800 --> 00:01:18.720
output tokens.

30
00:01:19.600 --> 00:01:22.460
And if we say hello again, exactly the

31
00:01:22.460 --> 00:01:23.180
same happens.

32
00:01:23.760 --> 00:01:25.400
And we can do this over and over

33
00:01:25.400 --> 00:01:27.320
and we'll roughly use the same.

34
00:01:28.160 --> 00:01:31.520
There could be variations over time, but for

35
00:01:31.520 --> 00:01:32.280
now, it's happy.

36
00:01:33.500 --> 00:01:36.680
But what happens if I tell it my

37
00:01:36.680 --> 00:01:36.980
name?

38
00:01:38.220 --> 00:01:40.700
My name is Rasmus.

39
00:01:42.860 --> 00:01:47.020
It will happily greet me, but we would

40
00:01:47.020 --> 00:01:48.960
now think it would know my name.

41
00:01:49.280 --> 00:01:53.720
But the second I ask it, what is

42
00:01:53.720 --> 00:01:54.880
my name?

43
00:01:57.780 --> 00:01:58.900
It doesn't know.

44
00:01:59.020 --> 00:02:00.880
It doesn't have access to my personal information,

45
00:02:01.060 --> 00:02:02.080
it feels like.

46
00:02:03.160 --> 00:02:04.280
Why is that?

47
00:02:04.280 --> 00:02:08.860
Well, this is because this call is just

48
00:02:08.860 --> 00:02:11.540
a fire and forget.

49
00:02:11.900 --> 00:02:15.680
We give a question, get an answer, and

50
00:02:15.680 --> 00:02:17.640
the next time we run it again, it's

51
00:02:17.640 --> 00:02:21.580
a completely new conversation for the LLM.

52
00:02:21.940 --> 00:02:23.700
It has no clue that we asked something

53
00:02:23.700 --> 00:02:29.300
in the past and what we have talked

54
00:02:29.300 --> 00:02:30.300
about so far.

55
00:02:31.620 --> 00:02:34.360
In order to do something about that, we

56
00:02:34.360 --> 00:02:37.120
need something called an agent thread.

57
00:02:37.800 --> 00:02:39.900
So if we go to agent and say

58
00:02:39.900 --> 00:02:45.100
get new thread, we can get one of

59
00:02:45.100 --> 00:02:46.060
these threads.

60
00:02:47.060 --> 00:02:50.340
The thread will internally have a message store

61
00:02:50.340 --> 00:02:52.140
where it saves all the messages.

62
00:02:53.560 --> 00:02:55.620
Let's just call it thread here.

63
00:02:58.060 --> 00:03:00.260
And all we need to do is we

64
00:03:00.260 --> 00:03:02.940
need to tell whenever we run this now

65
00:03:02.940 --> 00:03:06.240
that the thread is involved.

66
00:03:07.480 --> 00:03:09.220
And what will happen is it will add

67
00:03:09.220 --> 00:03:10.680
more and more to the thread so it

68
00:03:10.680 --> 00:03:11.840
knows a conversation.

69
00:03:12.080 --> 00:03:13.860
Just like if we were writing down a

70
00:03:13.860 --> 00:03:16.600
conversation between each other, we could look back

71
00:03:16.600 --> 00:03:19.000
at what was the first thing we talked

72
00:03:19.000 --> 00:03:19.280
about.

73
00:03:20.920 --> 00:03:24.020
So if you run this and now say

74
00:03:24.020 --> 00:03:28.480
hello, it will still come back with how

75
00:03:28.480 --> 00:03:29.320
can I assist you.

76
00:03:29.760 --> 00:03:33.380
But if we say hello again, it will

77
00:03:33.380 --> 00:03:34.280
say hello again.

78
00:03:35.220 --> 00:03:38.320
And that's fine, but also we see that

79
00:03:38.320 --> 00:03:41.960
our input tokens was not just the 8

80
00:03:41.960 --> 00:03:43.040
and 10 like normal.

81
00:03:43.400 --> 00:03:45.560
And if we say it again, this will

82
00:03:45.560 --> 00:03:46.620
now begin to grow.

83
00:03:47.620 --> 00:03:51.380
And this is because all the messages are

84
00:03:51.380 --> 00:03:55.180
in this conversation now because it needs to

85
00:03:55.180 --> 00:03:56.680
know what we talked about before.

86
00:03:57.220 --> 00:04:01.540
So if I say my name is Rasmus

87
00:04:01.540 --> 00:04:09.780
and then ask what is my name, we've

88
00:04:09.780 --> 00:04:12.540
got the conversation now that it knows but

89
00:04:12.540 --> 00:04:14.560
at the cost of spending more tokens.

90
00:04:15.339 --> 00:04:18.560
That is the reason why in ChatGPT and

91
00:04:18.560 --> 00:04:21.160
if we built this out as a chatbot,

92
00:04:21.520 --> 00:04:23.840
we would need to have a new button,

93
00:04:23.940 --> 00:04:26.720
a new chat, because not only are we

94
00:04:26.720 --> 00:04:29.680
spending more tokens again and again as we

95
00:04:29.680 --> 00:04:33.600
ask, but it will also slowly become slower

96
00:04:33.600 --> 00:04:36.040
and slower because it needs to process all

97
00:04:36.040 --> 00:04:39.940
the existing messages before it can answer the

98
00:04:39.940 --> 00:04:40.940
latest one.

99
00:04:44.560 --> 00:04:47.480
So let's try and see behind the scenes

100
00:04:47.480 --> 00:04:50.700
what's actually happening because we can get these

101
00:04:50.700 --> 00:04:51.760
messages out.

102
00:04:52.940 --> 00:04:56.860
So let's do that down here in that

103
00:04:56.860 --> 00:05:00.380
we can put in a thread.getService of

104
00:05:00.380 --> 00:05:01.840
iList of chat messages.

105
00:05:02.560 --> 00:05:04.920
It's a little hidden that we can do

106
00:05:04.920 --> 00:05:07.000
this, but we can get the messages back.

107
00:05:07.740 --> 00:05:13.020
So if we run this, and by the

108
00:05:13.020 --> 00:05:15.520
way, now that if I ask what is

109
00:05:15.520 --> 00:05:18.680
my name, of course it doesn't know because

110
00:05:18.680 --> 00:05:20.420
before it was in memory.

111
00:05:22.500 --> 00:05:24.960
So it's not like forever it knows now,

112
00:05:25.740 --> 00:05:27.580
but what we get back is we can

113
00:05:27.580 --> 00:05:28.820
see we get two messages.

114
00:05:29.480 --> 00:05:31.140
So the first message was from the user,

115
00:05:31.560 --> 00:05:34.000
me, what is my name, and it said,

116
00:05:34.240 --> 00:05:35.460
sorry, I don't know your name.

117
00:05:37.460 --> 00:05:43.000
If we ask again, hello, my name is

118
00:05:43.000 --> 00:05:49.520
Rasmus, we now get four messages.

119
00:05:49.900 --> 00:05:52.200
So whenever we do this right now with

120
00:05:52.200 --> 00:05:55.140
a simple setup like this, we always get

121
00:05:55.140 --> 00:06:00.780
two messages per conversation switch back and forth.

122
00:06:01.320 --> 00:06:03.580
If we began at some point to introduce

123
00:06:03.580 --> 00:06:05.140
tools, there would be more.

124
00:06:05.280 --> 00:06:08.920
There will also be messages from the tools

125
00:06:08.920 --> 00:06:11.400
getting the information from the tools back.

126
00:06:11.860 --> 00:06:14.460
But here we'll just get more and more

127
00:06:14.460 --> 00:06:16.440
of these, and since all of these are

128
00:06:16.440 --> 00:06:19.180
sent to the LLM at every point, so

129
00:06:19.180 --> 00:06:22.280
it can work with the conversation, it will

130
00:06:22.280 --> 00:06:23.800
over time build up.

131
00:06:24.800 --> 00:06:33.220
And the normal simple and often used scenario

132
00:06:33.220 --> 00:06:42.300
is, for example, who is Barack

133
00:06:42.300 --> 00:06:44.080
Obama?

134
00:06:45.160 --> 00:06:49.700
And it'll come back with that and say,

135
00:06:50.500 --> 00:06:55.160
how tall is he?

136
00:06:57.300 --> 00:07:00.380
And it can infer from the first message

137
00:07:00.380 --> 00:07:02.380
that what we're talking about.

138
00:07:03.500 --> 00:07:06.840
But again, at the cost of using extra

139
00:07:06.840 --> 00:07:07.240
tokens.

140
00:07:08.300 --> 00:07:12.360
255 is not a huge amount, so we

141
00:07:12.360 --> 00:07:13.240
can manage.

142
00:07:13.820 --> 00:07:16.560
But, of course, also what we need to

143
00:07:16.560 --> 00:07:19.880
remember is that this is only in memory.

144
00:07:20.200 --> 00:07:23.140
So if we went back and then said,

145
00:07:24.120 --> 00:07:28.200
where does he live?

146
00:07:28.320 --> 00:07:31.760
Because in our mind, the last question was,

147
00:07:32.000 --> 00:07:32.820
who is Barack Obama?

148
00:07:32.940 --> 00:07:33.680
How tall is he?

149
00:07:33.980 --> 00:07:35.900
Then it should also know where he lives.

150
00:07:36.320 --> 00:07:39.040
But, of course, it won't know because now

151
00:07:39.040 --> 00:07:41.420
we are back to zero in the token.

152
00:07:41.420 --> 00:07:46.380
It's like throwing away or forgetting, having short

153
00:07:46.380 --> 00:07:47.620
-term memory loss.

154
00:07:49.380 --> 00:07:52.780
This can be fixed, and we will have

155
00:07:52.780 --> 00:07:58.220
dedicated lectures on that as well, persisting the

156
00:07:58.220 --> 00:07:59.920
memory between sessions.

157
00:08:00.400 --> 00:08:03.140
But for now, what we have is good

158
00:08:03.140 --> 00:08:06.280
enough for us to continue with the chat

159
00:08:06.280 --> 00:08:09.060
in we have now a chat loop to

160
00:08:09.060 --> 00:08:09.560
work with.